Object Flattening

Most ML jobs are perceived to be read-intensive with a lot of small reads while a few ML jobs also perform small writes. This kind of I/O behavior require the storage system provides superior random/small read performance. Research shows that 99% read and write calls of Biology, Computer Science, Materials, and Chemistry are less that 10MB, over 90% are less than 1MB.

The data format of VOS is designed for generic requirements so it depends on scalable data structure like B+Tree and EVTree. However, for ML workload, because most of read calls are small so keeping scalable index for data does not help too much on performance, it however brings a lot of metadata overhead, which consumes DRAM after removing PMEM from the stack.

In order to reduce the metadata overhead of indexing user data of small objects/files, a technology called object flattening is proposed in this document.