Memory bucket allocator
WORM object is designed for AI/ML workload, it cannot work for all scenarios. For example, if DAOS is backend of block storage, the main workload will be 4K aligned random read and write, EVTree metadata of small extents (4K, 8K,…, 64K) can consume hundreds of gigabytes memory if the data volume is large enough. Because these extents can be frequently read or overwritten, so flattening them is not optimal for performance. DAOS should keep hierarchical format for this type of object, also be able to bring it back to DRAM before starting I/O handling if it’s already been evicted from DRAM. It means a new allocator that can expose memory bucket should be implemented. To be clear, the memory bucket is not part of DRAM mirror of MD-blob, it is just a large chunk of memory and its content can be flushed to DT-blob by checkpointing service.
While allocating an object, DAOS can either allocate a new memory bucket or select one with sufficient space, it should pass in the bucket ID for the space allocation during the I/O, the allocator will try to put all the metadata of the same object within the same bucket. Object is still hierarchical but its metadata is in memory bucket which is stored in DT-blob, so metadata of object can be evicted from DRAM. If future I/O request try to access object stored in memory bucket, then DAOS can read the entire memory bucket from DT-blob, so there is no cache miss during I/O handling.
It is always possible that an object is too large to be put all its metadata in the same bucket, for example, there is an object span across bucket[1] and bucket[2] in the diagram above. In this case, addresses of all the memory buckets should be saved as metadata of the object, which permanently stay in DRAM and stored in MD-blob. Before handling I/O request of large object, DAOS shall preload all the memory buckets used by the object.