All data are stored as regular(hierarchical) objects initially, because it is very challenging to handle write in flattened format (MVCC, ilog, DTX…). In addition, even for WORM object, application can still submit writes in multiple RPCs, e.g., appends, in this case, it is complex to add new keys and values to flattened buffer. To avoid these complexities, this project will design does not change the write handlinghandler, DAOS still which creates trees to index keys and values for write RPCrequest.
DAOS can provide API to allow user to indicate write completion of WORM objects:
...
When any of these cases happened, hierarchical object can be converted to WORM object by a new service which is called flattening service. The flattening service is similar to aggregation service, it is activated periodically and scans objects in the background, it can traverse object trees and append keys and values of the object can be activated periodically or on demand. This service traverses index trees of WORM object, appends keys and values to a contiguous buffer (if the object is small enough), then writes the buffer to DT-blob and release releases the original hierarchical object in MD-blob (and DRAM).
In order to avoid store unnecessary metadata in flattened buffer, the flattening service should run after aggregation, for example, after aggregation service merged extents generated by “append”. However, in this case, the index changes made by aggregation service is not meaningful anymore because they will be freed by flattening service later. Therefore, the ideal case is flattening service can also aggregate adjustment extents in EVTree. However, this means that functionalities of aggregation and flattening services are overlapped, so this can be considered as an improvement in the future.
The flattening service only serializes objects with small number of keys and values, because flattened format only supports leaner search which is inefficient if there are too many keys and values. It means that even for WORM object
Mostly for small object: linear search is inefficient for large object
Multiple WORM objects can be stored , if it has a lot of keys and values then it stays in hierarchical format to support efficient read.
DAOS will have a new read handler for flattened object, it can find the requested key and value from the flattened object format by leaner search. Because flattened object is stored in DT-blob and it can be evicted from DRAM. If storage engine receives a read request against an evicted object, it can bring the entire object back to DRAM by one SSD read.
Because DAOS only flattens small objects, so it should be able to store multiple flattened objects in the same SSD extent to avoid fragmentation and reduce write emplifcation.