QLC SSD Support
As per the QLC SSD characteristics of: high capacity, low endurance and poor small write performance, the QLC SSD won’t be used as the replacement of TLC SSD, instead, it’ll be kind of capacity supplement of TLC for bulk data handling.
In the context of QLC SSD support, we’ll keep the assumption of homogeneous storage target, which means each target will have the same amount of QLC and TLC capacity. The TLC will be still used to store WAL, meta data and small data, and the QLC will be used to store bulk data.
Control Plane Changes
dmg storage commands
‘dmg storage scan’ command might need be improved to show the discovered TLC and QLC SSDs in a distinguishable way, that would be helpful for administrator to configure the NVMe tiers in server YAML.
‘dmg storage query’ command also needs be improved to show the TLC and QLC information separately.
server YAML
Current DAOS server YAML supports single NVMe tier (associated with ‘data’ role) in the pmem mode, supports at most three NVMe tiers (associated with ‘data’, ‘meta’ and ‘wal’ role respectively) in the md-on-ssd mode.
To configure QLC SSDs in server YAML, an extra NVMe tier associated with ‘bulk_data’ role will be introduced, then at most 2 or 4 NVMe tiers could be configured in pmem or md-on-ssd mode respectively. The ‘bulk_data’ role should be applied to QLC tier only and it’s incompatible with other roles.
dmg pool commands
‘dmg pool create’ command needs be changed to pass down a QLC size parameter to engine, at the same time, the ‘dmg pool query’ command needs be updated to show the TLC and QLC usage separately.
There is an ongoing task of control plane changes for md-on-ssd phase2, please refer to the design and debate at: Control-Plane changes
Data Plane Changes
Four key changes would be involved on the data plane (engine) side:
An extra block allocator instance (including persistent heap of
vea_space_df
and runtime heap ofvea_space_info
) for the space management within the QLC blob of a VOS pool.An extra I/O context (
bio_io_context
) for the NVMe I/O against QLC blob of a VOS pool.An new address type (
bio_addr_t
) to represent an address in the QLC blob.An enhanced media selection policy (
vos_io_scm()
) to select QLC media for bulk update.
These changes will be iterated in following component sections respectively.
VEA changes
The QLC space will be managed separately from the TLC space, and the same block allocator VEA will be used for block management within the QLC blob. Current VEA already supports different block size for different instance, but we do need to scrutinize implementation details to see if there is anything won’t work well for large block size. (64k or 128k block size)
BIO changes
In general, BIO component needs be enhanced to handle an extra tier of SSDs associated with the ‘bulk_data’ role, an extra QLC blob and the corresponding I/O context will also be added for each VOS pool.
Few key data structure or on-disk format changes are listed below:
The most significant change is that
bio_addr_t
will be expanded to represent an address in QLC blob , which could be achieved by adding a new media type ofDAOS_MEDIA_QLC
.Similar to the ‘data’ tier SSDs, the SSDs in ‘bulk_data’ tier will be assigned to each VOS target in a round-robin manner, that requires a new SMD device type
SMD_DEV_TYPE_BULK
, and a new target table ‘bulk_target’ being added in SMD.When QLC is configured, an extra ‘bulk’ blob will be created for each VOS pool, that requires a new pool table ‘bulk_pool’ being added in SMD.
A new I/O context will be added to the
bio_meta_context
for the QLC blob associated with VOS pool.
VOS changes
Each VOS pool will have two block allocator instances (see
vos_pool_df->pd_vea_df
andvos_pool->vp_vea_info
), one for the old ‘data’ blob, the other for the new ‘bulk_data’ blob.Each VOS container will have two block allocator hints (see
vos_cont_df->cd_hint_df
andvos_container->vc_hint_ctxt
), one for the old ‘data’ blob, the other for the new ‘bulk_data’ blob.The media selection policy (
vos_io_scm()
) will be improved to select QLC media for bulk data update.
Interoperability
The on-disk format of
vos_pool_df
andvos_cont_df
changes need be handled carefully for the backward compatibility.The new media type and related pool RPC change implies wire protocol change, a new DAOS pool RPC version might need be introduced.
DAOS server, DAOS engine and dmg are always on the same version, no interoperability needs be considered for the control plane changes.
The SMD on-disk format changes won’t break the backward compatibility.
Future Work
Utilize the gang address (DAOS-10877 vos: gang allocation for huge SV by NiuYawei · Pull Request #14790 · daos-stack/daos ) to reduce the internal fragmentations (on QLC) caused by unaligned I/O?