Stakeholders

Johann, Mohamad, Xuezhao, Phil.

Introduction

The DAOS data model is very generic and users build different data models on top of it. Different data models and middleware libraries include:

The key point is that each of those middleware libraries maintain and understand their own data model over the DAOS data model. DAOS itself does not understand this data model and for example cannot make a distinction between a POSIX root directory or SB object or a regular file. From the DAOS API perspective and below, they are all regular objects.

After a catastrophic recovery event, the data model could become inconsistent after repair actions from different passes. For instance, an object (a directory in a POSIX container) might be removed by the DAOS distributed checker. If the POSIX container is remounted, this directory will not be seen in the namespace and everything under it becomes unreachable and leaked / lost in the container. This situation calls for for a generic infrastructure to support every middleware library to be able to check & repair it’s data model after it’s event.

This design document describes this generic infrastructure that DAOS will provide, and details two middleware libraries, DFS and PyDAOS, on how they will implement their own checkers using that infrastructure.

Requirements & Use Cases

As discussed in the previous section, when recovering from a catastrophic recovery event, repair actions from DAOS could render some data models broken. The requirements for this include developing:

  1. The generic infrastructure / API extensions to support middleware consistency are sufficient for existing data models to use for repair actions.

  2. The Middleware tools supported in this work (POSIX and PyDAOS) are able to properly utilize the new infrastructure to fix the namespace avoiding any leak objects / space in the container.

  3. A testing infrastructure is provided to allow corrupting containers to emulate a catastrophic recovery event to make it easy for testing the consistency tools.

Use cases according to each middleware:

  1. DFS/POSIX containers:

  2. PyDAOS containers:

Design Overview

The main infrastructure change to support the middleware consistency tools as we discussed before is the extension to the DAOS Object ID Table (OIT). Currently the OIT API allows one to create the OIT object and just iterate through all the object IDs at a particular snapshot in the container. For the MWC tools, we need some extensions to this OIT API to support the process of repairing the middleware model. This repair process is a two-pass process:

  1. Descend the middleware “namespace” and querying the object ID of every object that is visited / connected in the namespace. Every object ID that is seen should be marked as such in the in the OIT.

  2. Iterate through all the object IDs in the OIT and return to the middleware tool all the unmarked objects (orphaned objects). The middleware tool can decide what to do with those objects, to either punch then to reclaim space or reattach to a lost+found for the user to relink later.

The two main DAOS components that will require changes to support this are:

  1. The OIT API today does not support any updates or “markings” of object IDs and thus will need to be extended to support that.

  2. The DAOS POSIX and PyDAOS middleware which are within the DAOS source code will need to provide tools to enact this recovery process.

In-scope:

Out-of-scope:

User Interface

To support the Middleware Consistency tools for different middleware libraries, we need some generic infrastructure that these tools can use to detect irregularities in their data model. This generic infrastructure is in the Object ID Table (OIT) APIs to be able to go through the entire list objects in a container that the tools can use to check if any of the objects are missing from their data model. This API needs to be extended to allow for marking objects as “checked” to be able to restart or do another pass through the list when needed. These new APIs will be implemented:

int daos_oit_mark(daos_handle_t oh, daos_obj_id_t oid, d_iov_t *marker, daos_event_t *ev);
typedef int (daos_oit_filter_cb)(daos_obj_id_t oid, d_iov_t *marker);
int daos_oit_list_filter(daos_handle_t oh, daos_obj_id_t *oids, uint32_t *oids_nr, daos_anchor_t *anchor, daos_oit_filter_cb filter, daos_event_t *ev);

The first API allows a user to mark an object in the OIT list with a marker (max size of 128-bits). This marker can just be a single bit flag in the use case of middleware consistency to just indicate the object as “checked”. The second API allows listing all the object IDs in the OIT and call a user defined callback on each oid.

Using the new OIT APIs, we will extend the daos tool to support middleware consistency. Since we are supporting POSIX and PYDAOS containers, the tool extension would be something like:

daos container checker pool container

which depending on the container type would do the appropriate scanning of the namespace (POSIX or PYDAOS) and the OIT table and fix the irregularities in the data model.

This tool will support only containers with type POSIX and PYTHON. Passing containers of other types will return ENOTSUPP.

Impacts

No perf impacts. New APIs and tool extensions required (see UI section).

Quality

Testing of the work in this milestone is done at two layers as is the development:

  1. For testing the OIT API extensions, new component tests will be added to daos_test to verify that the API is working properly.

  2. For testing the MWC checker tools we will add new functional tests to:

    1. generate POSIX (using mdtest and IOR) and PyDAOS containers.

    2. use the new fault injection tool to simulate the state after a catastrophic recovery.

      1. all the supported states should be simulated.

    3. use the middleware checker tools to fix the containers.

    4. verify the state of the containers and that no orphan objects exist.

      1. this can be done through the OIT list API and some size queries on the pool to verify there is no leaked space

  3. Benchmark the performance of the checker tool on different container sizes.

Project Milestones

References (optional)

External papers, web page (if any)

Future Work (optional)

Known issues and future works that was considered out-of-scope.