Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

One container is associated with a set (at most 10's) of ROOT files.
A root superblock object includes global metadata as well as a pointer to a KV object that has one dkey per ROOT file. Under a dkey, a list of cluster objects associated with the ROOT file is stored.
A cluster object stores all the pages for a given cluster. Pages are gathered into slice stored under a specific dkey. Then the different pages are stored under different akey.

A "slice" is the set of pages of a single column in a cluster (the name 'slice' might still change).
We most likely have one more level of indirection to get to the cluster object: a footer object points to a list of cluster summaries, each cluster summary containing page meta-data for a set of consecutive clusters.

Scale:

  • A data set can be up to O(10 TB) in size
  • A data set can have O(1000) columns
  • A cluster is 10-100 MB (e.g. a data set can have up to 10^5 clusters)
  • A page is O(10kB) of compressed data (i.e. a cluster can have up to 10^4 pages)