QLC SSD Support

As per the QLC SSD characteristics of: high capacity, low endurance and poor small write performance, the QLC SSD won’t be used as the replacement of TLC SSD, instead, it’ll be kind of capacity supplement of TLC for bulk data handling.

In the context of QLC SSD support, we’ll keep the assumption of homogeneous storage target, which means each target will have the same amount of QLC and TLC capacity. The TLC will be still used to store WAL, meta data and small data, and the QLC will be used to store bulk data.

Control Plane Changes

dmg storage commands

dmg storage scan’ command might need be improved to show the discovered TLC and QLC SSDs in a distinguishable way, that would be helpful for administrator to configure the NVMe tiers in server YAML.

dmg storage query’ command also needs be improved to show the TLC and QLC information separately.

server YAML

Current DAOS server YAML supports single NVMe tier (associated with ‘data’ role) in the pmem mode, supports at most three NVMe tiers (associated with ‘data’, ‘meta’ and ‘wal’ role respectively) in the md-on-ssd mode.

To configure QLC SSDs in server YAML, an extra NVMe tier associated with ‘bulk_data’ role will be introduced, then at most 2 or 4 NVMe tiers could be configured in pmem or md-on-ssd mode respectively. The ‘bulk_data’ role should be applied to QLC tier only and it’s incompatible with other roles.

dmg pool commands

dmg pool create’ command needs be changed to pass down a QLC size parameter to engine, at the same time, the ‘dmg pool query’ command needs be updated to show the TLC and QLC usage separately.

There is an ongoing task of control plane changes for md-on-ssd phase2, please refer to the design and debate at: Control-Plane changes

Data Plane Changes

Four key changes would be involved on the data plane (engine) side:

  1. An extra block allocator instance (including persistent heap of vea_space_df and runtime heap of vea_space_info) for the space management within the QLC blob of a VOS pool.

  2. An extra I/O context (bio_io_context) for the NVMe I/O against QLC blob of a VOS pool.

  3. An new address type (bio_addr_t) to represent an address in the QLC blob.

  4. An enhanced media selection policy (vos_io_scm()) to select QLC media for bulk update.

These changes will be iterated in following component sections respectively.

VEA changes

The QLC space will be managed separately from the TLC space, and the same block allocator VEA will be used for block management within the QLC blob. Current VEA already supports different block size for different instance, but we do need to scrutinize implementation details to see if there is anything won’t work well for large block size. (64k or 128k block size)

BIO changes

In general, BIO component needs be enhanced to handle an extra tier of SSDs associated with the ‘bulk_data’ role, an extra QLC blob and the corresponding I/O context will also be added for each VOS pool.

Few key data structure or on-disk format changes are listed below:

  1. The most significant change is that bio_addr_t will be expanded to represent an address in QLC blob , which could be achieved by adding a new media type of DAOS_MEDIA_QLC.

  2. Similar to the ‘data’ tier SSDs, the SSDs in ‘bulk_data’ tier will be assigned to each VOS target in a round-robin manner, that requires a new SMD device type SMD_DEV_TYPE_BULK, and a new target table ‘bulk_target’ being added in SMD.

  3. When QLC is configured, an extra ‘bulk’ blob will be created for each VOS pool, that requires a new pool table ‘bulk_pool’ being added in SMD.

  4. A new I/O context will be added to the bio_meta_context for the QLC blob associated with VOS pool.

VOS changes

  1. Each VOS pool will have two block allocator instances (see vos_pool_df->pd_vea_df and vos_pool->vp_vea_info), one for the old ‘data’ blob, the other for the new ‘bulk_data’ blob.

  2. Each VOS container will have two block allocator hints (see vos_cont_df->cd_hint_df and vos_container->vc_hint_ctxt), one for the old ‘data’ blob, the other for the new ‘bulk_data’ blob.

  3. The media selection policy (vos_io_scm()) will be improved to select QLC media for bulk data update.

Interoperability

  • The on-disk format of vos_pool_df and vos_cont_df changes need be handled carefully for the backward compatibility.

  • The new media type and related pool RPC change implies wire protocol change, a new DAOS pool RPC version might need be introduced.

  • DAOS server, DAOS engine and dmg are always on the same version, no interoperability needs be considered for the control plane changes.

  • The SMD on-disk format changes won’t break the backward compatibility.

Future Work

Utilize the gang address (DAOS-10877 vos: gang allocation for huge SV by NiuYawei · Pull Request #14790 · daos-stack/daos ) to reduce the internal fragmentations (on QLC) caused by unaligned I/O?