Creating multiple SCM namespaces per CPU socket
The DAOS 2.4 release will support creating multiple SCM namespaces per NUMA node using the daos_server storage prepare --scm-only
command (see ticket DAOS-9876: storage prepare option to create multiple SCM namespacesResolved ). This mini-guide explains how to do this manually with the ipmctl
and ndctl
tools.
Perform the following steps to create two SCM namespaces per socket on a dual-socket host:
Ensure a clean slate where PMem modules are configured in memory mode and have not yet been set to AppDirect mode.
Create one PMem AppDirect (interleaved) region for each socket.
Create two PMem namespaces per PMem AppDirect region.
Validate that the PMem namespaces have been created and are visible as Linux block devices
Reset PMem into memory mode
To ensure a clean slate, configure PMem in memory mode to ensure all AppDirect configuration is removed:
Stop the DAOS server:
dmg system stop ; systemctl stop daos_server
Ensure that all PMem-based filesystems are unmounted (
df |grep pmem
should not show anything,umount
whatever is still mounted)daos_server storage prepare --scm-only --reset --force
(this will remove the namespaces and set the PMem goal to MemoryMode)Reboot to apply the memory resource allocation goal changes in BIOS.
ipmctl show -region
The output should be: There are no Regions defined in the system.
Create one PMem AppDirect region for each socket:
All the PMem modules attached to a specific socket will be combined in to a region.
ipmctl create -f -goal PersistentMemoryType=AppDirect
Reboot to apply the memory resource allocation goal changes in BIOS.
ipmctl show -d PersistentMemoryType,FreeCapacity -region
Output should be of the form (the FreeCapacity
value depends on the quantity and capacity of the PMem modules, this example is from a CLX server with six 512GiB PMem modules per socket):
---ISetID=0x2aba7f4828ef2ccc---
PersistentMemoryType=AppDirect
FreeCapacity=3012.0 GiB
---ISetID=0x81187f4881f02ccc---
PersistentMemoryType=AppDirect
FreeCapacity=3012.0 GiB
Create two PMem namespaces per PMem AppDirect region:
Each region will be divided into two PMem namespaces of equal size (in this example, 3012 GiB / 2 = 1506 GiB
). The namespace size should be 2 MiB aligned and a multiple of the interleave-width (see the ndctl create-namespace
command help for more details).
Run commands to create two namespaces on each of the two AppDirect regions:
GIB=3012
let SIZE=$GIB*1024*1024*1024/2 ; echo $SIZE
# in this example, the output should be 1617055186944
ndctl create-namespace --region 0 --size $SIZE
ndctl create-namespace --region 0 --size $SIZE
ndctl create-namespace --region 1 --size $SIZE
ndctl create-namespace --region 1 --size $SIZE
Each of the ndctl
invocations should report the properties of the created namespace.
List PMem AppDirect region info:
After creating the namespaces, the PMem regions should have no free capacity left.
ipmctl show -d PersistentMemoryType,FreeCapacity -region
Output should be of the form:
---ISetID=0x2aba7f4828ef2ccc---
PersistentMemoryType=AppDirect
FreeCapacity=0.0 GiB
---ISetID=0x81187f4881f02ccc---
PersistentMemoryType=AppDirect
FreeCapacity=0.0 GiB
List PMem namespaces:
New PMem namespace details should be available.
ndctl list -N -v
Output should be of the form:
List the PMem block devices
Verify that the PMem namespaces are visible as Linux block devices:
ls -al /dev/pmem*
lsblk|grep -E "NAME|pmem"
On a 2-socket CLX server with six 128GiB PMem modules per socket, the output should be similar to this:
brw-rw---- 1 root disk 259, 0 Jun 22 13:12 /dev/pmem0
brw-rw---- 1 root disk 259, 1 Jun 22 13:12 /dev/pmem0.1
brw-rw---- 1 root disk 259, 2 Jun 22 13:13 /dev/pmem1
brw-rw---- 1 root disk 259, 3 Jun 22 13:13 /dev/pmem1.1
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
pmem0 259:0 0 372.1G 0 disk
pmem1 259:2 0 372.1G 0 disk
pmem0.1 259:1 0 372.1G 0 disk
pmem1.1 259:3 0 372.1G 0 disk
Note that the naming of the two PMem devices per socket is not symmetric: they are named pmem0
, pmem0.1
on the first socket, and pmem1
, pmem1.1
on the second socket.
This completes the manual creation of the PMem devices, which would normally be performed by the daos_server storage prepare --scm-only
command. The PMem block devices can now be used in the daos_server.yml file to configure a total of four engines on two sockets.