WORM (Write-Once-Read-Many) Object

Regular object of DAOS is stored in hierarchical format, it can support multi-version and arbitrary overwrites, DAOS has to keep tree indexes for these objects to support high efficient key and value lookup. It means that DAOS has to do multi-allocation for every I/O (object, tree node, key, tree node, value…), if these allocations are small enough, most of them will stay in the MD-blob forever: it is impossible for DAOS to migrate all small pieces to DT-blob because write amplification is unacceptable.

WORM object

Because data model of DAOS can support snapshot, overwrite, distributed transaction, MVCC…, so it is hard to change the metadata format. However, for workloads like AI/ML, the dataset for ingestion will not not modified by AI/ML, It means that keeping indexes and logs for objects is not always necessary, especially for objects are relatively small. The terminology to describe this kind of AI/ML datasets is Write-Once-Read-Many (WORM).

WORM object will never be modified again after completion of write, so it can be serialized, which is called flattened in this design document, into contiguous buffer and migrate to DT-blob (unpinned from DRAM). After migrating to DT-blob, memory occupied by the object can be freed. In the future, before serving read, the entire object can be brought back to DRAM by one SSD read from DT-blob.

Flattened format

The flattened object has similar format as “pool map” of DAOS(diagram on the left side), as show in the diagram on the right side, each component in the buffer includes the offset of its first child, it also records total number of children. While serving read request, keys and values can be found by linear search in the flattened buffer.

Data format of WORM object


NB: because DAOS should provide fast search and read but flattened object can only support linear search, so DAOS can still keep indexes for large object even it is WORM. This is OK because metadata overhead of large object is negligible comparing with data.