Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Today each VOS instance has single associated “data blob” to store the bulk values, to support metadata on SSD, two more blobs will be introduced, one is named “meta blob” for storing VOS index and small values, the other is named “WAL blob” for storing WAL. Depending on the configuration schemes, meta blob and WAL blob could reside in same SSD or separate SSDs, they could also share same SSD with data blob as well.

Meta blob layout

Meta blob starts with a “meta blob header”, the remaining area is “metadata area” which is filled by VOS indexes, small values and allocator heap.

  • The durable format of meta blob header is defined as following:

struct meta_blob_header {
  uint32_t  mbh_magic;
  uint32_t  mbh_version;
  uuid_t    mbh_meta_devid;   /* Meta SSD device ID */
  uuid_t    mbh_wal_devid;    /* WAL device ID */
  uuid_t    mbh_data_devid;   /* Data device ID */
  uint64_t  mbh_meta_blobid;  /* Meta SPDK blob ID */
  uint64_t  mbh_wal_blobid;   /* WAL SPDK blob ID */
  uint64_t  mbh_data_blobid;  /* Data SPDK blob ID */
  uint32_t  mbh_blk_bytes;    /* Block size for meta blob, in bytes */
  uint32_t  mbh_hdr_blks;     /* Meta blob header size, in blocks */
  uint64_t  mbh_tot_blks;     /* Meta blob capacity, in blocks */
  uint32_t  mbh_vos_id;       /* Meta target ID, per engine ID */
  uint16_t  mbh_csum_type;    /* Checksum type */
  uint16_t  mbh_csum_len;     /* Checksum length in bytes */
  uint8_t   mbh_csum[0];      /* Checksum of the header */
};
  • The layout of metadata area (TBD)

WAL blob layout

WAL blob starts with a “WAL blob header”, the remaining area is used as a circular log to store WAL transactions.

  • The durable format of WAL blob header is defined as following:

struct wal_header {
  uint32_t  wh_magic;
  uint16_t  wh_csum_type; /* Checksum type used for checking transaction data integrity */
  uint16_t  wh_blk_bytes; /* WAL block size, usually 4k */
  uint64_t  wh_tot_blks;  /* WAL blob capacity, in blocks */
  uint64_t  wh_ckp_id;    /* Last checkpointed transaction ID */
  uint64_t  wh_commit_id; /* Last committed transaction ID */
  uint64_t  wh_next_id;   /* Next unused transaction ID */
  uint32_t  wh_csum_len;  /* Checksum length in bytes */
  uint8_t   wch_csum[0];  /* Checksum of the header */
};

The WAL transaction ID consists of two parts:

  1. The low 32 bits represents offset within the WAL. That means it can support up to 16TB WAL size for a 4k block sized WAL.

  2. The high 32 bits represents sequence number which is increased by 1 once the log wraps.

Each transaction starts with a “WAL transaction header” entry, and it’s followed by multiple “WAL transaction operation” entries, the last entry of a transaction is “WAL transaction csum” entry, it contains the checksum of all entries and will be used for data integrity check on recovery phase.

Each transaction is always stored from the beginning of a block, and if any transaction spans multiple blocks, the WAL transaction header will be put to the beginning of every involved block. If a transaction entry is too large to fit into a block, it needs be split into multiple entries.

  • The durable format of WAL transaction entries are defined as following:

#define WAL_HDR_FL_CSUM 0x1 /* The tail csum entry is in current block */

struct wal_trans_head {
  uint32_t  wth_magic;
  uint16_t  wth_len;    /* Transaction data length within current block, in bytes */
  uint16_t  wth_flags;  /* Transaction header flags */
  uint64_t  wth_id;     /* Transaction ID */
};

enum wal_trans_op_type {
  /* Memory copy data to given VOS blob offset */
  WAL_OP_MEMCPY = 0,
  /* Memory move data of given VOS blob offset */
  WAL_OP_MEMMOVE,
  /* Zeoring data from given VOS blob offset */
  WAL_OP_ZEROING,
  /* Checksum of given data on data blob */
  WAL_OP_CSUM,
  WAL_OP_MAX,
};

struct wal_trans_entry {
  uint64_t  wte_off;    /* Offset within VOS or data blob, in bytes */
  uint32_t  wte_type;   /* Operation type */
  uint32_t  wte_len;    /* Data length in bytes */
  uint8_t   wte_data[0];
};

struct wal_trans_csum {
  uint32_t  wtc_len;    /* Checksum length in bytes */
  uint8_t   wtc_csum[0];
};

  • No labels