/
Object flattening

Object flattening

Most AI/ML jobs are perceived to be read-intensive with a lot of small reads while a few ML jobs also perform small writes. This kind of I/O behavior require the storage system provides superior random/small read performance. Research shows that 99% read and write calls of Biology, Computer Science, Materials, and Chemistry are less that 10MB, over 90% are less than 1MB.

The data format of VOS is designed for generic requirements so it depends on scalable data structure like B+Tree and EVTree. However, for read-intensive AI/ML workload, because most of read calls are small, keeping scalable index for data does not help much on performance. It actually brings a lot of metadata overhead, which consumes DRAM after removing PMEM from the stack.

In order to reduce the metadata overhead of indexing user data of small objects/files, a technology called object flattening is proposed in this document.