DAOS Community

This site is the main community repository for DAOS information. Consult the sidebar for links to discover, use and contribute to DAOS.


What is DAOS?

The Distributed Asynchronous Object Storage (DAOS) is an open-source software-defined object store designed from the ground up for massively distributed Non Volatile Memory (NVM). DAOS takes advantage of next generation NVM technology like Storage Class Memory (SCM) and NVM express (NVMe) while presenting a key-value storage interface and providing features such as transactional non-blocking I/O, advanced data protection with self healing on top of commodity hardware, end-to-end data integrity, fine grained data control and elastic storage to optimize performance and cost.

Why DAOS?

The emergence of data-intensive applications in business, government and academia stretches the existing I/O models beyond limits. Modern I/O workloads feature an increasing proportion of metadata combined with misaligned and fragmented data. Conventional storage stacks deliver poor performance for these workloads by adding a lot of latency and introducing alignment constraints. The advent of affordable large-capacity persistent memory combined with an integrated fabric offers a unique opportunity to redefine the storage paradigm and support modern I/O workloads efficiently.

This revolution requires a radical rethink of the complete storage stack. To unleash the full potential of this new technology, the new stack must embrace byte-granular shared-nothing interface from the ground up and be able to support massively distributed storage for which failure will be the norm, while preserving low latency and high bandwidth access over the fabric.

DAOS is a complete I/O architecture that aggregates SCM and NVMe storage distributed across the fabric into globally-accessible object address spaces, providing consistency, availability and resiliency guarantees without compromising performance.

How to use DAOS?

DAOS is open-sourced software licensed under the BSD+Patent license. The DAOS source code is publicly available on Github.
A community mailing list is hosted on daos.groups.io and a community chat on Slack.

Community Newsletter (September'22)

Please find below the DAOS community newsletter for September 2022.

Past Events

  • Flash Memory Summit’22: 3rd Workshop on Extreme-Scale Storage and Analysis (August 2nd-4th)
    Requirements and Challenges Associated with the World's Fastest Storage Platform
    https://www.flashmemorysummit.com
    Jeff Olivier (Intel)

Upcoming Events

  • IXPUG Annual Conference 2022 (Sep 29)
    The Evolution of Storage and Memory and the DAOS Role in It

    Kevin Harms (ANL)
    Andrey Kudryavtsev (Intel)
  • SuperCheck-SC'22 (Nov 14)
    DAOS: Nextgen Storage Stack for HPC and AI
    Johann Lombardi (Intel)
  • SC'22 BoF (Nov 15-17)
    DAOS Storage Community BoF
    Kevin Harms (ANL)
    Michael Hennecke (Intel)
    Dean Hildebrand (Google)
    Panagiotis Adamidis (DKRZ)
  • SC'22 BoF (Nov 15-17)
    The Storage Tower of Babel? ... Not! Actually, maybe?
    Philippe Deniel (CEA)
    John Bent (Seagate)
    Tiago Quintino (ECMWF)
    Johann Lombardi (Intel)
  • SC'22 Tutorial (Nov 13-14)
    Emerging Storage Interfaces: DAOS and PMDK
    Adrian Jackson (EPCC)
    Mohamad Chaarawi (Intel)
    Johann Lombardi (Intel) 
  • 6th annual DAOS User Group (Nov/Dec'22)

Release

  • Current stable release is 2.0.3. See https://docs.daos.io/v2.0/ and https://packages.daos.io/v2.0/ for more information.
    2.0.3 includes several fixes for ARM64 support, erasure code and pool operations. Please see the release notes for more details.
  • Branches:
    • release/2.0 is the release branch for the stable 2.0 release. Latest bug fix release is 2.0.3 (v2.0.3 tag).
    • release/2.2 is the development branch for the future 2.2 release. The first release candidate has been created (v2.2.0-rc1 tag).
    • Master is the development branch for the future 2.4 release. Latest test build is 2.3.100 (v2.3.100-tb tag). New build including EC parity rotation feature imminent.
  • Major recent changes on release/2.0 (bugfix release):
    • Several coverty fixes
    • Fix incorrect assertion failure hit when running soak testing with LAMMPS application
    • Bump hadoop-common version to 3.3.3
    • Several documentation fixes
    • Several test fixes.
  • Major recent changes on release/2.2 (future 2.2 release):
    • All patches listed in the 2.0 section above.
    • Update mercury to 2.2.0
    • Update pmdk to 1.12.1
    • Trigger DTX reindex before DTX resync
    • Fix issue with srx_disabled config field
    • Fix mtime set to not rely on DAOS HLC
    • Improve DAOS build preprocessing steps
    • Fix java jar build instructions
    • Reduce lock contention on hash lock in libdaos to increase multi-thread performance
    • Set UCX_IB_FORK_INIT env var in the engine
    • Add new metrics to track EC full stripe and partial updates
    • Improve dfs_setattr to re-sample mtime on file size changes
    • Add UCX documentation
    • Do not use stable epoch for reclaim
    • Fix dfs_open for directories without O_EXCL
    • Add support for 2.0/2.2 agent interoperability
  • Major recent changes on master (future 2.4 release):
    • All patches listed in the 2.2 section above.
    • Add prefix to notice logging in the control plane
    • Add githook install script
    • Move NLT and unit tests to el8
    • Fix a race in dc_tx_get_epoch
    • Fix name match in daos_oclass_name2id()
    • Add ability for engine to manage its own ABT stack via mmap() to pro-actively detect stack overrun
    • Limit number of outstanding I/Os to NVMe device
    • Remove indirect link for ISA-L
    • Store scan objects target ID during rebuild to avoid excessive iteration when sending object list
    • Create a single bulk handle per DMA chunk and share the same handle for all bulk transfer against the same DMA chunk.
    • Retry map_fresh on more errors
    • Refactor daos_server standalone command surface
    • Reject read/write hole in bio
    • Run NLT on ARM64 self-hosted runners
    • Fix gap in EC rotation patch in tx classify
    • Replace SWIM D_CIRCLEQ with a hash table.
    • Fix VMD domain parsing
    • Accept positional args in dfuse command to support mtab entries
    • Set EC cell alignment to 32 bytes
    • Disallow IP address with negative port in the control plane
  • What is coming:

    • 2.2.0 GA
    • 2.4.0 feature freeze

R&D

  • Major features under development:
    • VOS on SPDK blob
    • Multi-user dfuse
    • More aggressive caching in dfuse for AI APPs
      • FUSE version updated for EL8 for readdir caching support, not needed on Leap that was recent enough FUSE version.
      • FUSE kernel readdir is on enabled, dfuse readdir still under work.
      • PR: https://github.com/daos-stack/daos/pull/6776
      • Target release: 2.4
    • Catastrophic recovery
      • Aka distributed fsck or checker
      • Tests for ddb (low level debugger utility similar to debugfs for ext4) under review
      • Testing for the dmg checker under development
      • Pass 4 for container recovery completed.
      • Branch: feature/cat_recovery
      • Target release: 2.6
    • Multi-homed network support
      • Aka multi-provider support
      • This feature aims at supporting multiple network provider in the engine
      • Branch is feature complete now and testing is underway
      • Branch: feature/multiprovider
      • Target release: 2.6
    • Client-side metrics
    • Performance domain
      • Extend placement algorithm to be aware of fabric topology
      • Fix to avoid putting shards on the same domain landed
      • Branch: feature/perf_dom
      • Target release: 2.8
  • Pathfinding:
    • DAOS Pipeline API for active storage
    • Leveraging the Intel Data Streaming Accelerator (DSA) to accelerate DAOS
      • Prototype leveraging DSA for VOS aggregation delivered
      • Initial results shared at IXPUG conference.
    • OPX provider support in collaboration with Cornelis Networks
      • OPX provider merged upstream in libfabric
      • Provider supported in latest mercury version
      • Changes to DAOS to enable OPX as part of the build in progress
    • GPU data path optimizations
  • I/O Middleware / Framework Support

News

  • In addition to building on ARM platform on Ubuntu 22.04, AlmaLinux 8 and Leap 15, some basic tests (called NLT, stands for Node Local Tests) are now run on every PR landing. See this link for more information .Thanks again for Linaro and Croit for their support.Next step is to run unit tests.
  • Congrats to Croit and DenisB for merging the SPDK DAOS bdev upstream!
  • The  DAOS community BoF for SC'22 has been accepted!