Creating multiple SCM namespaces per CPU socket
The DAOS 2.4 release will support creating multiple SCM namespaces per NUMA node using the daos_server storage prepare --scm-only command (see ticket https://daosio.atlassian.net/browse/DAOS-9876 ). This mini-guide explains how to do this manually with the ipmctl and ndctl tools.
Perform the following steps to create two SCM namespaces per socket on a dual-socket host:
Ensure a clean slate where PMem modules are configured in memory mode and have not yet been set to AppDirect mode.
Create one PMem AppDirect (interleaved) region for each socket.
Create two PMem namespaces per PMem AppDirect region.
Validate that the PMem namespaces have been created and are visible as Linux block devices
Reset PMem into memory mode
To ensure a clean slate, configure PMem in memory mode to ensure all AppDirect configuration is removed:
Stop the DAOS server:
dmg system stop ; systemctl stop daos_serverEnsure that all PMem-based filesystems are unmounted (
df |grep pmemshould not show anything,umountwhatever is still mounted)daos_server storage prepare --scm-only --reset --force(this will remove the namespaces and set the PMem goal to MemoryMode)Reboot to apply the memory resource allocation goal changes in BIOS.
ipmctl show -region
The output should be: There are no Regions defined in the system.
Create one PMem AppDirect region for each socket:
All the PMem modules attached to a specific socket will be combined in to a region.
ipmctl create -f -goal PersistentMemoryType=AppDirectReboot to apply the memory resource allocation goal changes in BIOS.
ipmctl show -d PersistentMemoryType,FreeCapacity -region
Output should be of the form (the FreeCapacity value depends on the quantity and capacity of the PMem modules, this example is from a CLX server with six 512GiB PMem modules per socket):
---ISetID=0x2aba7f4828ef2ccc---
PersistentMemoryType=AppDirect
FreeCapacity=3012.0 GiB
---ISetID=0x81187f4881f02ccc---
PersistentMemoryType=AppDirect
FreeCapacity=3012.0 GiBCreate two PMem namespaces per PMem AppDirect region:
Each region will be divided into two PMem namespaces of equal size (in this example, 3012 GiB / 2 = 1506 GiB). The namespace size should be 2 MiB aligned and a multiple of the interleave-width (see the ndctl create-namespace command help for more details).
Run commands to create two namespaces on each of the two AppDirect regions:
GIB=3012
let SIZE=$GIB*1024*1024*1024/2 ; echo $SIZE
# in this example, the output should be 1617055186944
ndctl create-namespace --region 0 --size $SIZE
ndctl create-namespace --region 0 --size $SIZE
ndctl create-namespace --region 1 --size $SIZE
ndctl create-namespace --region 1 --size $SIZEEach of the ndctl invocations should report the properties of the created namespace.
List PMem AppDirect region info:
After creating the namespaces, the PMem regions should have no free capacity left.
ipmctl show -d PersistentMemoryType,FreeCapacity -region
Output should be of the form:
---ISetID=0x2aba7f4828ef2ccc---
PersistentMemoryType=AppDirect
FreeCapacity=0.0 GiB
---ISetID=0x81187f4881f02ccc---
PersistentMemoryType=AppDirect
FreeCapacity=0.0 GiBList PMem namespaces:
New PMem namespace details should be available.
ndctl list -N -v
Output should be of the form:
[
{
"dev": "namespace0.0",
"mode": "fsdax",
"map": "dev",
"size": 1617055186944,
"uuid": "842fc847-28e0-4bb6-8dfc-d24afdba1528",
"raw_uuid": "dedb4b28-dc4b-4ccd-b7d1-9bd475c91264",
"sector_size": 512,
"blockdev": "pmem0",
"numa_node": 0
},
{
"dev": "namespace0.1",
"mode": "fsdax",
"map": "dev",
"size": 1617055186944,
"uuid": "842fc847-28e0-4bb6-8dfc-d24afdba1529",
"raw_uuid": "dedb4b28-dc4b-4ccd-b7d1-9bd475c91264",
"sector_size": 512,
"blockdev": "pmem0.1",
"numa_node": 0
},
{
"dev": "namespace1.0",
"mode": "fsdax",
"map": "dev",
"size": 1617055186944,
"uuid": "842fc847-28e0-4bb6-8dfc-d24afdba1530",
"raw_uuid": "dedb4b28-dc4b-4ccd-b7d1-9bd475c91264",
"sector_size": 512,
"blockdev": "pmem1",
"numa_node": 1
},
{
"dev": "namespace1.1",
"mode": "fsdax",
"map": "dev",
"size": 1617055186944,
"uuid": "842fc847-28e0-4bb6-8dfc-d24afdba1531",
"raw_uuid": "dedb4b28-dc4b-4ccd-b7d1-9bd475c91264",
"sector_size": 512,
"blockdev": "pmem1.1",
"numa_node": 1
}
]
List the PMem block devices
Verify that the PMem namespaces are visible as Linux block devices:
ls -al /dev/pmem*lsblk|grep -E "NAME|pmem"
On a 2-socket CLX server with six 128GiB PMem modules per socket, the output should be similar to this:
brw-rw---- 1 root disk 259, 0 Jun 22 13:12 /dev/pmem0brw-rw---- 1 root disk 259, 1 Jun 22 13:12 /dev/pmem0.1brw-rw---- 1 root disk 259, 2 Jun 22 13:13 /dev/pmem1brw-rw---- 1 root disk 259, 3 Jun 22 13:13 /dev/pmem1.1
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTpmem0 259:0 0 372.1G 0 diskpmem1 259:2 0 372.1G 0 diskpmem0.1 259:1 0 372.1G 0 diskpmem1.1 259:3 0 372.1G 0 disk
Note that the naming of the two PMem devices per socket is not symmetric: they are named pmem0, pmem0.1 on the first socket, and pmem1, pmem1.1 on the second socket.
This completes the manual creation of the PMem devices, which would normally be performed by the daos_server storage prepare --scm-only command. The PMem block devices can now be used in the daos_server.yml file to configure a total of four engines on two sockets.