cluster: 10's of MB
page: 10's of KB

Current proposal

One container is associated with a set (at most 10's) of ROOT files.
A root superblock object includes global metadata as well as a pointer to a KV object that has one dkey per ROOT file. Under a dkey, a list of cluster objects associated with the ROOT file is stored.
A cluster object stores all the pages for a given cluster. Pages are gathered into slice stored under a specific dkey. Then the different pages are stored under different akey.

A "slice" is the set of pages of a single column in a cluster (the name 'slice' might still change).


  • A data set can be up to O(10 TB) in size
  • A data set can have O(1000) columns
  • A cluster is 10-100 MB (e.g. a data set can have up to 10^5 clusters)
  • A page is O(10kB) of compressed data (i.e. a cluster can have up to 10^4 pages)