storage prepare option to create multiple SCM namespaces

Description

All PMem devices connected to the same Xeon CPU socket are combined to form a single device in AppDirect-interleaved mode (using ipmctl). We currently create one namespace on this device, and then mount this single /dev/pmemX to be used by the single engine that is running on that CPU socket.

In order to more efficiently use next-generation Xeon processors (SPR and beyond) with current-generation (200Gbps) HPC fabric links, we will need to support more than 1 HPC fabric links per CPU socket. We therefore need a mechanism to run two (or more) engines per CPU socket (as libfabric does not support striping across multiple interfaces). To do this, we will need to create two (or more) SCM devices per CPU socket.

Implementation proposal: Provide a -S | --scm-namespaces N option to the daos_server storage prepare command, with a default of 1, and the ability to request 2 SCM namespaces per CPU socket.

Activity

Show:

Tom Nabarro July 14, 2022 at 1:48 PM

additional test case added in PR which was merged to master in commit https://github.com/daos-stack/daos/commit/13130cbc84987722527f74e188f9a328b204eafd

Michael Hennecke June 22, 2022 at 11:46 AM

Wiki page to explain how to do this manually has been created:

This is for testing only, for production usage the DAOS 2.4 release should be used.

Tom Nabarro April 8, 2022 at 12:22 PM

PR landed to master (after 2.2 was branched so effectively for release 2.4) in Commit https://github.com/daos-stack/daos/commit/b1933c34d8621fa511bf497bdf1352e01c9925be

Tom Nabarro March 31, 2022 at 8:29 PM

Manual testing performed on the feature branch:

  • full reset / reboot / prepare / reboot / prepare (which removes namespaces and regions, recreates regions then recreates namespaces)

  • the same with multiple ns/socket (2 and 4 )

  • setting up single ns/socket then double after

  • vice versa

  • setting up Appdirect not interleaved and trying to run prepare

  • setting up Appdirect not interleaved and running reset

Tom Nabarro March 24, 2022 at 6:39 PM

Fixed

Details

Assignee

Reporter

Priority

Affects versions

Required for Version

Components

Patch URL

Story Points

Created February 9, 2022 at 8:58 AM
Updated February 16, 2023 at 9:03 AM
Resolved April 8, 2022 at 12:22 PM