Meta blob and WAL blob layout
Today each VOS instance has single associated “data blob” to store the bulk values, to support metadata on SSD, two more blobs will be introduced, one is named “meta blob” for storing VOS index and small values, the other is named “WAL blob” for storing write-ahead log (WAL). Depending on the configuration schemes, meta blob and WAL blob could reside in same SSD or separate SSDs, they could also share same SSD with data blob as well.
Meta blob layout
Meta blob starts with a “meta blob header”, the remaining area is “metadata area” which is filled by VOS indexes, small values and allocator heap.
The durable format of meta blob header is defined as following:
struct meta_blob_header {
uint32_t mbh_magic;
uint32_t mbh_version;
uuid_t mbh_meta_devid; /* Meta SSD device ID */
uuid_t mbh_wal_devid; /* WAL device ID */
uuid_t mbh_data_devid; /* Data device ID */
uint64_t mbh_meta_blobid; /* Meta SPDK blob ID */
uint64_t mbh_wal_blobid; /* WAL SPDK blob ID */
uint64_t mbh_data_blobid; /* Data SPDK blob ID */
uint32_t mbh_blk_bytes; /* Block size for meta blob, in bytes */
uint32_t mbh_hdr_blks; /* Meta blob header size, in blocks */
uint64_t mbh_tot_blks; /* Meta blob capacity, in blocks */
uint32_t mbh_vos_id; /* Meta target ID, per engine ID */
uint32_t mbh_padding[6]; /* Reserved */
uint32_t mbh_csum; /* Checksum of this header */
};
The layout of metadata area (TBD)
WAL blob layout
WAL blob starts with a “WAL blob header”, the remaining area is used as a circular log to store WAL transactions.
The durable format of WAL blob header is defined as following:
struct wal_header {
uint32_t wh_magic;
uint32_t wh_version;
uint32_t wh_gen; /* WAL re-format timestamp */
uint16_t wh_blk_bytes; /* WAL block size, usually 4k */
uint16_t wh_padding1; /* Reserved */
uint64_t wh_tot_blks; /* WAL blob capacity, in blocks */
uint64_t wh_ckp_id; /* Last checkpointed transaction ID */
uint64_t wh_commit_id; /* Last committed transaction ID */
uint32_t wh_ckp_blks; /* Blocks used by last check-pointed transaction */
uint32_t wh_commit_blks; /* Blocks used by last committed transaction */
uint64_t wh_padding2; /* Reserved */
uint32_t wh_padding3; /* Reserved */
uint32_t wh_csum; /* Checksum of this header */
};
The WAL transaction ID consists of two parts:
The low 32 bits represents offset within the WAL. That means it can support up to 16TB WAL size for a 4k block sized WAL.
The high 32 bits represents sequence number which is increased by 1 every time the log wraps.
Each transaction starts with a “WAL transaction header” entry, and it’s followed by one or multiple “WAL transaction operation” entries, the payload data from entries are concatenated after the last entry, checksum of all these entries and payload data is placed after the payload (or the last entry when there is no payload).
Each transaction is always stored from the beginning of a block, and if any transaction spans multiple blocks, the WAL transaction header will be put to the beginning of every involved block.
The durable format of WAL transaction entries are defined as following:
struct wal_trans_head {
uint32_t wth_magic;
uint32_t wth_gen; /* WAL re-format timestamp */
uint64_t wth_id; /* Transaction ID */
uint32_t wth_tot_ents; /* Total entries */
uint32_t wth_tot_payload; /* Total payload size in bytes */
};
struct wal_trans_entry {
uint64_t wte_off; /* Offset within meta or data blob, in bytes */
uint16_t wte_type; /* Operation type, see umem design */
uint16_t wte_len; /* Data length in bytes */
uint32_t wte_data; /* Varioius inline data */
};
struct wal_trans_tail {
uint32_t wtc_csum; /* Checksum of the transaction */
};