storage prepare option to create multiple SCM namespaces

Description

All PMem devices connected to the same Xeon CPU socket are combined to form a single device in AppDirect-interleaved mode (using ipmctl). We currently create one namespace on this device, and then mount this single /dev/pmemX to be used by the single engine that is running on that CPU socket.

In order to more efficiently use next-generation Xeon processors (SPR and beyond) with current-generation (200Gbps) HPC fabric links, we will need to support more than 1 HPC fabric links per CPU socket. We therefore need a mechanism to run two (or more) engines per CPU socket (as libfabric does not support striping across multiple interfaces). To do this, we will need to create two (or more) SCM devices per CPU socket.

Implementation proposal: Provide a -S | --scm-namespaces N option to the daos_server storage prepare command, with a default of 1, and the ability to request 2 SCM namespaces per CPU socket.

Linked issues

is related to

DAOS-11076

Mismatch between PMem block device name and NUMA node

DAOS-10284

Dissect daos_server storage prepare into (scm|nvme) (prepare|reset) subcommands

is related to

DAOS-10016

Alert when PMEM DIMMs are not in AppDirect mode

DAOS-9565

daos_server doesn't validate that engines are on different numa nodes

Activity

Show:

Tom Nabarro July 14, 2022 at 1:48 PM

additional test case added in PR which was merged to master in commit https://github.com/daos-stack/daos/commit/13130cbc84987722527f74e188f9a328b204eafd

Michael Hennecke June 22, 2022 at 11:46 AM

Wiki page to explain how to do this manually has been created:

This is for testing only, for production usage the DAOS 2.4 release should be used.

Tom Nabarro April 8, 2022 at 12:22 PM

PR landed to master (after 2.2 was branched so effectively for release 2.4) in Commit https://github.com/daos-stack/daos/commit/b1933c34d8621fa511bf497bdf1352e01c9925be

Tom Nabarro March 31, 2022 at 8:29 PM

Manual testing performed on the feature branch:

full reset / reboot / prepare / reboot / prepare (which removes namespaces and regions, recreates regions then recreates namespaces)
the same with multiple ns/socket (2 and 4 )
setting up single ns/socket then double after
vice versa
setting up Appdirect not interleaved and trying to run prepare
setting up Appdirect not interleaved and running reset

Tom Nabarro March 24, 2022 at 6:39 PM

Fixed

Details
Assignee
Tom Nabarro
Reporter
Michael Hennecke
Priority
P2-High
Affects versions
2.0.1 Community Release
Required for Version
2.4 Community Release
Fix versions
2.4 Community Release
Components
Patch URL
Story Points
5

Created February 9, 2022 at 8:58 AM

Updated February 16, 2023 at 9:03 AM

Resolved April 8, 2022 at 12:22 PM

storage prepare option to create multiple SCM namespaces

Description

Linked issues

is related to

is related to

Activity

Tom Nabarro July 14, 2022 at 1:48 PM

Michael Hennecke June 22, 2022 at 11:46 AM

Tom Nabarro April 8, 2022 at 12:22 PM

Tom Nabarro March 31, 2022 at 8:29 PM

Tom Nabarro March 24, 2022 at 6:39 PM

DetailsAssigneeTom NabarroTom NabarroReporterMichael HenneckeMichael HenneckePriorityP2-HighAffects versions2.0.1 Community ReleaseRequired for Version2.4 Community ReleaseFix versions2.4 Community ReleaseComponentsPatch URLStory Points5

Details

Assignee

Reporter

Priority

Affects versions

Required for Version

Fix versions

Components

Patch URL

Story Points

Details
Assignee
Tom Nabarro
Reporter
Michael Hennecke
Priority
P2-High
Affects versions
2.0.1 Community Release
Required for Version
2.4 Community Release
Fix versions
2.4 Community Release
Components
Patch URL
Story Points
5