DAOS Community Home
This site is the main community repository for DAOS information. Consult the sidebar for links to discover, use and contribute to DAOS.
What is DAOS?
The Distributed Asynchronous Object Storage (DAOS) is an open-source software-defined object store designed from the ground up for massively distributed Non Volatile Memory (NVM). DAOS takes advantage of next generation NVM technology like Storage Class Memory (SCM) and NVM express (NVMe) while presenting a key-value storage interface and providing features such as transactional non-blocking I/O, advanced data protection with self healing on top of commodity hardware, end-to-end data integrity, fine grained data control and elastic storage to optimize performance and cost.
Why DAOS?
The emergence of data-intensive applications in business, government and academia stretches the existing I/O models beyond limits. Modern I/O workloads feature an increasing proportion of metadata combined with misaligned and fragmented data. Conventional storage stacks deliver poor performance for these workloads by adding a lot of latency and introducing alignment constraints. The advent of affordable large-capacity persistent memory combined with an integrated fabric offers a unique opportunity to redefine the storage paradigm and support modern I/O workloads efficiently.
This revolution requires a radical rethink of the complete storage stack. To unleash the full potential of this new technology, the new stack must embrace byte-granular shared-nothing interface from the ground up and be able to support massively distributed storage for which failure will be the norm, while preserving low latency and high bandwidth access over the fabric.
DAOS is a complete I/O architecture that aggregates SCM and NVMe storage distributed across the fabric into globally-accessible object address spaces, providing consistency, availability and resiliency guarantees without compromising performance.
How to use DAOS?
DAOS is open-sourced software licensed under the BSD+Patent license. The DAOS source code is publicly available on Github.
A community mailing list is hosted on daos.groups.io and a community chat on Slack.
Community Newsletter (October'22)
Please find below the DAOS community newsletter for October 2022.
Past Events
- Flash Memory Summit’22: 3rd Workshop on Extreme-Scale Storage and Analysis (August 2nd-4th)
Requirements and Challenges Associated with the World's Fastest Storage Platform
https://www.flashmemorysummit.com
Jeff Olivier (Intel) - IXPUG Annual Conference 2022 (Sep 29)
The Evolution of Storage and Memory and the DAOS Role in It
https://www.ixpug.org/events/ixpug-2022
Kevin Harms (ANL)
Andrey Kudryavtsev (Intel) - HPC User Forum (Oct 3)
DAOS Technical/Strategy Update
Johann Lombardi (Intel)
https://www.hpcuserforum.com/wp-content/uploads/2022/10/Intel-J.Lombardi.pdf
Upcoming Events
- SuperCheck-SC'22 (Nov 14 at 8:35am CST)
DAOS: Nextgen Storage Stack for HPC and AI
https://supercheck.lbl.gov/schedule
Johann Lombardi (Intel) - SC'22 BoF (Nov 16 at 5:15pm CST)
DAOS Storage Community BoF
https://sc22.supercomputing.org/presentation/?id=bof147&sess=sess357
Kevin Harms (ANL)
Michael Hennecke (Intel)
Dean Hildebrand (Google)
Panagiotis Adamidis (DKRZ) - SC'22 BoF (Nov 17 at 12:15pm CST)
The Storage Tower of Babel? ... Not! Actually, maybe?
https://sc22.supercomputing.org/presentation/?id=bof150&sess=sess378
Philippe Deniel (CEA)
John Bent (Seagate)
Tiago Quintino (ECMWF)
Johann Lombardi (Intel) - SC'22 Tutorial (Nov 14 at 8:30am CST)
Emerging Storage Interfaces: DAOS and PMDK
https://sc22.supercomputing.org/presentation/?id=tut132&sess=sess213
Adrian Jackson (EPCC)
Mohamad Chaarawi (Intel)
Johann Lombardi (Intel) - 6th annual DAOS User Group (Nov 14 from 9am to 1:30pm CST)
See DUG22 for more information.
Release
- Current stable release is 2.2.0 released on Oct 21. See https://docs.daos.io/v2.2/ and https://packages.daos.io/v2.2/ for more information.
Please see the release notes for more details. - With the release of 2.2.0, 2.0.x releases are declared end-of-life.
- Branches:
- release/2.2 is the release branch for the stable 2.2 release. Latest bug fix release is 2.2.0 (v2.2.0 tag).
- Master is the development branch for the future 2.4 release. Latest test build is 2.3.101 (v2.3.101-tb tag) including the EC rotation feature.
- Major recent changes on release/2.2 (future 2.2 release):
- Fix VMD domain parsing
- Fix PS replica leaks
- Fix 2.0/2.2 interoperability issue with pool RF
- Fix assertion failure in dc_cont_free()
- Fix race condition in cart
- Address memory corruption during key_query
- Several fixes for EC migration
- Check and reset NONEXIST in iter_next and probe
- Bump protobuf-java from 3.16.1 to 3.16.3
- Major recent changes on master (future 2.4 release):
- All patches listed in the 2.2 section above.
- Fix a bug in key enumeration associated with ads[0].kd_key_len
- Add support for rf_lvl to cont create api on pydaos
- Enable EC parity rotation by default
- Add missing void in dfs_init/fini declaration
- Remove RPC post increment restriction preventing extra RPC handles from being posted upon exhaustion
- Re-enable custom RPC timeout in RDB
- Remove ability to build w/o stdatomic.h
- Add bulk and vos latency to metrics
- Skip reclaim job during merge
- Fix some DTX visibility issues
- Allo daos_server network scan to run w/o config
- Update DAOS to use UCX 1.13 and disable UCX multi-rail support
- Don't hold lock for d_hhash_link_get/putref
- Add dmg system exclude
- Fix auto object class selection for RP hints for arrays
- Don't set pool destroy state if service is not up
- Improve PS reconfigurations
- Add IOPS info to daos pool autotest
- Fix swim paranoia
- Reject invalid number of pool create ranks
- Add config option to agent to ignore interfaces
- Several fixes to EC parity rotation
- Add support for pull request template
- Fix a number of python flake issues
- Add ability to run server under valgrind
- Add NUMA affinity to tmpfs mount options
- Add pool svc list to property query
- Bypass checks in pool evict rdb tx update
- Several IV fixes
- Remove CentOS7 leftovers
- Add DFS readdirplus API
- Several checksum scrubbing upgrade fixes
- Rename privileged helper from daos_admin to daos_server_helper
- Rename rf and rf_level properties to rd_fac and rd_lvl
- Add rebuild version to pool query
- Bump garbage collection ULT stack size
What is coming:
- 2.2.1 bug fix release
- 2.4.0 feature freeze
R&D
- Major features under development:
- VOS on SPDK blob
- Detailed design documented here Metadata on SSDs including the WAL layout (Meta blob and WAL blob layout)
- All development and testing tasks are tracked under - DAOS-11040Getting issue details... STATUS for phase 1.
- Changes to the yaml file implemented. WAL infrastructure and metadata blob creation landed.
- PMDK-based allocator extracted and integrated into DAOS. Early performance evaluation in progress.
- Branch: feature/vos-on-blob
- Target release: 2.4 (phase 1 preview)
- Multi-user dfuse
- Allow dfuse mountpoint to be accessible by multiple users
- Still working on extending test coverage before landing the patch on master.
- PR: https://github.com/daos-stack/daos/pull/3956
- Target release: 2.4
- More aggressive caching in dfuse for AI APPs
- FUSE version updated for EL8 for readdir caching support, not needed on Leap that was recent enough FUSE version.
- FUSE kernel readdir is on enabled, dfuse readdir still under work.
- PR: https://github.com/daos-stack/daos/pull/6776
- Target release: 2.4
- Catastrophic recovery
- Aka distributed fsck or checker
- Tests for ddb (low level debugger utility similar to debugfs for ext4) landed
- Testing for the dmg checker landed.
- Testing for pass 3 and 4 under development.
- Pass 4 for container recovery completed.
- Branch: feature/cat_recovery
- Target release: 2.6
- Multi-homed network support
- Aka multi-provider support
- This feature aims at supporting multiple network provider in the engine
- Branch is feature complete now and testing is underway
- Branch: feature/multiprovider
- Target release: 2.6
- Client-side metrics
- API to collect libdaos metrics to be eventually integrated with Darshan
- PR: https://github.com/daos-stack/daos/pull/6497
- Target release: 2.6
- Performance domain
- Extend placement algorithm to be aware of fabric topology
- Fix to avoid putting shards on the same domain landed
- Branch: feature/perf_dom
- Target release: 2.8
- Pathfinding:
- DAOS Pipeline API for active storage
- Feature branched updated to latest master.
- MariaDB DAOS engine with predicate pushdown: https://github.com/daos-stack/mariadb/tree/daos_edu
- Prototype of server-side find using pipeline API
- Branch: feature/pipeline_api
- Leveraging the Intel Data Streaming Accelerator (DSA) to accelerate DAOS
- Prototype leveraging DSA for VOS aggregation delivered
- Initial results shared at IXPUG conference.
- OPX provider support in collaboration with Cornelis Networks
- OPX provider merged upstream in libfabric
- Provider supported in latest mercury version
- Changes to DAOS to enable OPX as part of the build in progress
- DAOS Pipeline API for active storage
- GPU data path optimizations
- I/O Middleware / Framework Support
- TensorFlow-IO plugin for DAOS
- https://github.com/daos-stack/tensorflow-io-daos/blob/devel/docs/daos_tf_docs.md
- https://docs.daos.io/v2.2/user/tensorflow/
- Upstream PR has been discussed with maintainer and should be hopefully merged soon.
- S3 support via a DAOS backend to Rados Gateway (RGW)
- DAOS backend to RGW submitted and merged upstream by Zuhair/Seagate
- https://github.com/ceph/ceph/pull/47709
- DAOS/DFS integration with NFS Ganesha
- Still under exploration by Croit
- YCSB DAOS backend
- Contributed by Sherin/HPE
- https://github.com/brianfrankcooper/YCSB/pull/1601
- PR waiting for review/merge upstream
- TensorFlow-IO plugin for DAOS
News
- Congratulation to the whole Seagate team for the integration of the DAOS backend to the Rados Gateway (RGW)!
- Updated DAOS roadmap including changes for the md_on_ssd phase 1 and phase 2 project to be available soon.