Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Jira Legacy
serverSystem JIRA
serverIdf325724b-f7c9-34db-bd1c-69d12ec98a69
keyDAOS-11414

Terminologies:

  • MD-blob: metadata blob, all its contents will be copied to DRAM.

  • DT-blob: data blob, its content can only be temporally put in DRAM while serving I/O.

Background

DAOS has a python tool to estimate internal metadata consumption based on DFS data model (daos_storage_estimator.py), the numbers below are the results for 1 million 4K files.

  • 1.0 GB metadata

    • 196.28 MB (object)

    • 307.20 MB (dkey)

    • 329.00 MB (akey)

    • 192.00 MB (array value)

  • 4.0 GB user data

Internal metadata is about 25% of user data for 4K files, but there are a few more things that are not counted:

  • VEA and DTX space consumption are not considered

  • PMDK/DAV has its own internal metadata

If a DAOS storage server has 1TB DRAM, it reserves 20% of the DRAM for OS, DMA/RDMA buffers, VOS object cache, VEA index, DTX tables…, then it has 800GB for MD-blobs of all pools. Based on the estimated results above, each 4K file consumes 1K bytes for internal metadata, this storage server can store 800 million 4K files at most, which is 3.2TB user data. Giving a storage server can have over 100TB or more SSDs for user data, DAOS server(MD-on-SSD phase-I) can only make use of tiny portion of the storage space if dataset of application only includes small files.

There are a few ways to improve the this:

Reduce memory consumption

Reduce the memory consumption can allow DAOS server to serve more small files even without evicting objects from DRAM, for example, if the internal metadata per 4K file can be reduced to 100 bytes, then a DAOS server can host 8 billion 4K-sized file. From another perspective, it can store 32TB user data even if all files are a few kilobytes.

Object flattening and eviction

The main challenge of evicting object from DRAM is, DAOS is using generic data structure for internal metadata, theoretically, an object can scatter to different memory pages. It means that if an object was evicted from DRAM, then future I/O against this object will trigger multiple cache misses and chained reads from SSD, which can badly hurt the I/O performance. Serializing small object to contiguous buffer and storing it in SSD before evicting, which is called flattening in this document, guarantees the entire object can be fetched into DRAM on cache miss, so the read latency is no longer than latency of one SSD read.

Hint based allocation

TODO