Support for hardlinks in DAOS filesystem.

Support for hardlinks in DAOS filesystem.

Stakeholders

Sherin Thyil George, Johann Lombardi, Lance Evans, Mohamad Chaarawi

Introduction

DFS historically stores file metadata in the directory entry (dentry) for that file.

That model works when a file has a single name, but it does not work for hard links because multiple dentries must share one logical inode.

For hard links, the following state must be shared across names:

mode
ownership
timestamps
size
xattrs
link count

Requirements & Use Cases

Several customers have requested POSIX hardlink support in DFS to ensure compatibility with existing applications and workflows that rely on standard filesystem semantics. Key use cases include efficient data backup and snapshots, data migration from filesystems to DFS, atomic updates and safe file replacement, data sharing across directories etc.

Design Overview

Current DFS File and Namespace Model

In the current DFS implementation, a directory is represented as a DAOS object that stores one record per child name. Each child name corresponds to a directory entry within the parent directory object, and that entry contains the child’s metadata, including file type, mode, ownership, timestamps, and the referenced DAOS object identifier (OID). A regular file is represented by two components. The first is the directory entry in its parent directory, which stores the file’s metadata along with the file’s OID. The second is a separate DAOS array object, identified by that OID, which stores the file’s data. As a result, the directory entry both names the file and acts as the authoritative store for its metadata.

This model establishes an effective one‑to‑one relationship between a directory entry and a file object. Each filename in a directory owns its own metadata record, and that record points to exactly one file data object. DFS does not maintain a separate, shared inode structure; instead, metadata is tightly bound to the directory entry itself.

This design prevents support for hardlinks. POSIX hardlinks require multiple directory entries to share a single logical inode, including common ownership, permissions, timestamps, extended attributes, file size, and a shared link count. In the current DFS model, this shared inode state has no common location. Because the metadata lives inside individual directory entries, creating an additional name for the same file would result in multiple directory entries with independent metadata records, with no native mechanism to keep them consistent or to maintain a single authoritative inode state.

Hardlink Semantics in DFS

Hardlink support in DFS is intended to match normal POSIX file semantics for regular files. The correctness model is:

  • Multiple directory entries may refer to the same logical file object, meaning different names in the namespace resolve to the same underlying inode state rather than to separate files. This is the core requirement.

  • All inode metadata must be shared across those names. If one link changes file metadata, the change must be visible through every other link to the same file.

  • The link count, equivalent to POSIX nlink, must reflect the total number of directory entries that currently reference that file. Creating a hardlink increments the count, removing one name decrements it, and stat must report that shared count consistently for all links.

  • File lifetime is tied to the link count, not to any one pathname. Removing a single name must not delete the file while other hardlinks still exist. The file data and shared metadata are only destroyed when the final link is removed and the link count reaches zero.

Inode and Metadata Model Changes

To support hardlinks, DFS extends its metadata model for regular files from a purely directory-entry-based representation to one that supports a shared file identity. Files that have never been hardlinked continue to use the existing model, where the directory entry remains the authoritative location for metadata.

When a file becomes hardlinked, DFS transitions it to a shared inode form that allows multiple directory entries to reference a single logical file. In this model, directory entry identity is decoupled from inode identity: the directory entry represents a name within a parent directory, while the inode represents the underlying file shared across all names. Metadata that must remain consistent across all hardlinks is stored as inode metadata rather than per-directory-entry metadata. This includes ownership, permissions, timestamps, file size, extended attributes, and the link count. The link count is maintained as an inode-level property reflecting the number of directory entries referencing the file.

Global Inode Table (GIT)

This design introduces Global Inode Table (GIT), a persistent DFS metadata structure used to store authoritative, shared inode metadata for files that participate in the hardlink model.

Role of GIT

GIT maintains one shared metadata record for each hardlinked file. Each GIT record is keyed by the file object identifier, which serves as the stable identity referenced by all directory entries associated with the file. The GIT record contains inode-level metadata that must be shared across all hardlinks, including ownership, permissions, timestamps, file size, extended attributes, and the link count. Directory entries continue to function as namespace mappings from a parent directory and name to a file object identifier. Once a file transitions into the hardlink model, directory entries no longer own the authoritative shared metadata for that file; GIT becomes the authoritative source of inode metadata instead.
Each GIT entry is keyed by the file object identifier, with the file’s OID used as the dkey for storing and retrieving the shared metadata.

Design Intent

The introduction of GIT enables shared inode semantics for hardlinked files without requiring a fundamental redesign of the existing DFS metadata model. Ordinary files that are not hardlinked continue to use the original directory-entry-based metadata model, while hardlink-specific complexity is isolated to cases that require shared metadata.

Directory Entry Handling

Hardlink support changes the logical interpretation of directory entries in the DFS namespace while preserving their role as namespace bindings.

A directory entry continues to represent a mapping from a parent directory and name to a file. For hardlinked files, however, multiple directory entries may reference the same underlying file identity rather than distinct files. The original directory entry remains part of the namespace after hardlink creation, and each additional hardlink results in a new directory entry that refers to the same underlying file. As a result, different names—whether within the same directory or across different directories in the same DFS namespace—may resolve to the same file.

For hardlinked regular files, a dedicated bit in the directory entry’s mode field indicates that the entry refers to GIT-backed inode metadata. This bit determines whether file metadata is obtained from the local directory entry or from the shared GIT record.

Namespace operations continue to operate on directory entries as names. Creating a hardlink adds a new directory entry, and unlinking removes one directory entry, without affecting the association between the remaining directory entries and the underlying file.

Constraints

  • Hardlinks are supported only for regular files. Directory and symbolic entries do not participate in hardlink aliasing.

  • Hardlinks are confined to a single DFS namespace and container.

  • All hardlinked directory entries must refer to the same underlying file object.

Hardlink Lifecycle

The lifecycle of a hardlinked file in DFS is defined by the transition from directory‑entry‑owned metadata to shared GIT‑backed metadata, followed by link‑count‑driven reference management and cleanup.

Initial File Creation

A newly created file begins as a normal DFS file.

  • The file is represented by a single directory entry.

  • The directory entry is the authoritative store for the file’s metadata.

  • No GIT entry exists at this stage.

First Hardlink Creation

When the first hardlink is created for a file that is not already hardlinked, DFS converts the file to the shared‑metadata model.

  • An GIT entry is created for the file, keyed by the file object OID.

  • Existing inode metadata is copied from the original directory entry into the GIT record.

  • Any file extended attributes are copied from the directory entry into GIT and removed from the directory entry so that GIT becomes the sole authoritative location for shared inode metadata.

  • The link count stored in GIT (ref_cnt) is initialized to 2, reflecting the original name and the newly created hardlink.

  • A dedicated hardlink indicator bit is set in both the original directory entry and the newly created directory entry to indicate that metadata must be resolved through GIT.

  • A new directory entry is created in the target parent directory, referring to the same underlying file object.

Subsequent Hardlink Creation

When a hardlink is created for a file that has already entered the hardlink model, no additional metadata migration is required.

  • The existing GIT entry remains the authoritative source of shared metadata.

  • The link count stored in GIT is incremented.

  • A new directory entry is created for the additional name, referring to the same file object OID and carrying the hardlink indicator bit.

Unlink and Removal

When a hardlinked name is removed, DFS removes only the corresponding namespace entry unless it is the final remaining reference to the file.

  • The directory entry associated with the name is removed from the namespace.

  • The link count stored in GIT is decremented.

  • The underlying file object is retained while the link count remains greater than zero.

  • When the final link is removed, the file object is deleted and the associated GIT entry is cleaned up.

This lifecycle preserves POSIX hardlink semantics while integrating cleanly with the existing DFS metadata and namespace model.

Consistency and Concurrency Model

Hardlink support introduces metadata operations that span multiple DFS objects, most notably directory entries and GIT.

Correctness Requirements

The design must satisfy the following correctness properties:

  • Consistency: Multi‑step metadata updates must not expose partially updated or internally inconsistent state to other operations.

  • Atomicity (Balanced Mode): Concurrent metadata operations must appear serialized to ensure correct behavior across multiple clients.

These requirements are met through a combination of DAOS Distributed Transactions (DTX) and targeted re‑fetch logic.

Relaxed Mode

In relaxed mode, no concurrent updates to file metadata are expected. As a result, full cross‑client atomicity between concurrent metadata operations is not required. However, consistency is still required for operations that update more than one metadata object, specifically those that span both the directory object and GIT. Operations such as:

Operations that require coordinated updates across multiple objects—such as:

  • creating the first hardlink,

  • removing a hardlink, and

can be executed under DTX to ensure that related changes become visible atomically and that intermediate states are not visible.

Balanced Mode

In Balanced mode the additional concurrency challenge arises when a regular file transitions into the hardlink model while other clients are performing metadata updates on the same file concurrently.

A representative race condition is as follows:

  • Client A fetches metadata from a normal directory entry.

  • Client B concurrently creates the first hardlink for the file, transitioning metadata ownership to GIT.

  • Client A proceeds with an update based on stale assumptions about metadata ownership.

Such races can be addressed using two complementary mechanisms:

  • Operations that already perform a fetch‑and‑update sequence execute the entire sequence under DTX in balanced mode.

  • Update‑only operations perform an additional post‑update metadata fetch in balanced mode to detect whether the file transitioned into the hardlink model concurrently. If such a transition is detected, the update is re‑applied against GIT to ensure correctness.

This approach preserves consistency and atomicity guarantees while minimizing overhead for common metadata operations.

Proposed Changes to DFS Library (libdfs)

The hardlink implementation extends DFS Library (libdfs) from a purely directory‑entry‑based metadata model to one that supports shared inode semantics for regular files with multiple names. The design introduces GIT‑backed shared metadata, updates namespace handling to allow multiple names to reference the same file, and extends metadata and remove paths to preserve POSIX hardlink semantics.

Metadata and Namespace Model

DFS Library (libdfs) retains the existing directory‑entry‑owned metadata model for ordinary files. For hardlinked regular files, libdfs transitions to a shared metadata model based on Global Inode Table (GIT).

The detailed inode, metadata, and directory‑entry semantics are defined in Inode and Metadata Model Changes and Directory Entry Handling. The role of GIT and its relationship to directory entries are defined in Global Inode Table (GIT).

Container and Mount State

The libdfs container layout will be extended to include a reserved GIT object in addition to the existing superblock and root objects. libdfs mount, unmount, and handle‑serialization paths are updated to open, carry, and close the GIT state together with the DFS handle. Reserved object handling and OID allocation logic will be updated accordingly.

Hardlink Lifecycle

libdfs will support the transition from a normal file to a hardlinked file, creation of additional hardlink names, and cleanup when the final link is removed. The lifecycle rules and state transitions are defined in Hardlink Lifecycle.

Lookup, Stat, and Metadata Resolution

Lookup and stat paths in libdfs are updated to recognize hardlinked files and resolve authoritative metadata from the appropriate source. Returned mode and link‑count values reflect POSIX‑visible state rather than internal markers.

Metadata fetch paths also handle races in which a file transitions to the hardlink model while an operation is in progress.

Metadata Update and Remove Paths

Metadata update operations in libdfs will be extended so that hardlinked files use shared GIT‑backed metadata rather than per‑directory‑entry metadata. Remove and unlink behavior distinguishes between removing a single namespace reference and deleting the underlying file object.

Rename and overwrite flows will also be adjusted to preserve correct hardlink semantics.

New Hardlink Operation

libdfs will introduce a dedicated hardlink operation to create an additional directory entry for an existing regular file. This operation performs the required namespace and metadata transitions in a consistent manner. A dfs_sys version of dfs_link will also be provided.

Consistency and Recovery

Hardlink operations will introduce multi‑object updates involving directory entries, GIT state, and file objects. libdfs transaction handling will be extended to preserve consistency across these updates.

Checker, verification, and repair flows will be updated to validate GIT contents, hardlink indicators, and stored link counts.

Observability

libdfs metrics and inspection paths will be extended to account for hardlink operations and shared link‑count reporting.

Design Outcome

This design preserves existing behavior for ordinary files while introducing a dedicated shared‑metadata path for hardlinked regular files in DFS Library (libdfs). The detailed semantics of inode identity, GIT ownership, namespace behavior, and hardlink lifecycle are defined in Inode and Metadata Model Changes, Global Inode Table (GIT), Directory Entry Handling, and Hardlink Lifecycle, while the remaining changes realize those semantics within libdfs.

Proposed Changes to dfuse Module

The hardlink implementation extends dfuse to preserve stable inode semantics while allowing multiple namespace paths to refer to the same underlying DFS object. These changes primarily affect FUSE operation handling, inode and dentry cache management, invalidation behavior, and unlink and rename processing.

FUSE Operation Support

dfuse adds explicit support for the FUSE link operation. The new link callback invokes the underlying hardlink operation in DFS Library (libdfs).

When a hardlink is created, dfuse returns the same inode number for the new name as for the existing file, reflecting that all hardlink names refer to the same underlying file object and inode identity.

Inode Cache and Dentry Tracking

dfuse extends its inode cache to allow a single cached inode entry to own multiple dentries. Instead of assuming a one‑to‑one relationship between a cached inode and a specific parent/name pair, dfuse tracks all known namespace references associated with the same inode.

This model allows dfuse to represent hardlinked files correctly in the kernel‑facing cache without creating duplicate inode entries for the same underlying file. The cached inode entry represents file identity, while the associated dentries represent the set of namespace paths currently known for that file.

Lookup and Open Path Behavior

During lookup, if dfuse resolves a file whose inode is already cached and whose link count indicates multiple names, the new parent/name pair is attached to the existing inode entry rather than creating a separate cached inode. This ensures that lookups of different hardlink names converge on a single cached inode representation.

Open and release paths are updated so that dentry invalidation and reference tracking operate on the full set of dentries associated with an inode, rather than assuming a single cached name.

Dentry Invalidation and Timeout Handling

Timeout‑based dentry invalidation is extended from a single‑name model to an inode‑wide model. When an inode becomes eligible for invalidation, dfuse invalidates all dentries associated with that inode rather than a single parent/name pair.

To support this safely, dfuse introduces explicit synchronization for dentry tracking, as invalidation may occur concurrently with lookup, rename, unlink, and release operations. This ensures coherence between kernel cache state and the multi‑dentry inode model required for hardlink support.

Unlink Semantics

dfuse distinguishes between removing a single namespace entry and deleting the underlying file. When one hardlink name is removed but other links remain, dfuse removes only the corresponding dentry from its tracking structures, and the cached inode entry remains valid.

Only when the final hardlink is removed does dfuse invalidate the inode itself and remove all associated dentries, reflecting deletion of the underlying file.

Rename Semantics

Rename handling is updated to operate on sets of dentries rather than a single cached name. When a hardlinked file is renamed, dfuse updates or replaces the affected dentry while preserving the shared inode identity.

If a rename operation overwrites an existing file, dfuse distinguishes between cases where the overwritten file is fully deleted and cases where only one hardlink name is removed. Any stale dentries released during rename processing are invalidated to ensure kernel‑visible state remains consistent with namespace changes.

Kernel Notification Behavior

Kernel notification handling in dfuse is extended so that invalidation and delete notifications can be issued for multiple dentries associated with the same inode. This includes support for invalidating all cached names of an inode and, when required, deleting all remaining dentries associated with a file that has been fully removed.

These updates ensure that kernel‑visible namespace state remains correct for hardlinked files.

Observability and Accounting

dfuse operation accounting is extended to track hardlink operations. Metrics and tracing are updated so that hardlink creation and hardlink‑aware namespace updates are visible through existing dfuse observability mechanisms.

Design Outcome

This design allows dfuse to preserve stable inode identity for hardlinked files while correctly tracking multiple namespace names for the same underlying object. Conceptually, dfuse transitions from a single‑dentry‑per‑inode cache model to a multi‑dentry‑per‑inode model, while preserving correct lookup, invalidation, rename, and unlink behavior required for POSIX hardlink semantics.

Performance Impact

Relaxed Mode

In relaxed mode, no performance impact is expected for existing operations on files that are not hardlinked. The current fast path for regular files remains unchanged, as no concurrent metadata updates are assumed.

Balanced Mode

In balanced mode, additional overhead is introduced to guarantee atomicity and correctness across concurrent clients for both regular and hardlinked files.

For metadata operations, balanced mode must account for the possibility that a regular file may be converted into the hardlink model concurrently with an in‑progress operation. As a result, metadata fetch and update sequences may require DTX protection to prevent metadata ownership from changing between steps.

For update‑only operations, an additional metadata fetch is performed after the update to detect whether the file transitioned to the hardlink model without the updating client being notified. If such a transition is detected, the update is re‑validated or re‑applied against GIT as required. This applies to both regular files (which may be converted concurrently) and files that are already hardlinked.

User Interface

How is the user/admin expected to interact with the new feature? Describe the API/tool.

New APIs dfs_link() and dfs_sys_link() will be provided to user to create hardlinks.


What are all the tunables provided to the user/admin?

None.


Any extra statistics that should reported to the user/admin?

Existing counters for accounting that are part of the libdfs and dfuse will be enhanced to report for hardlinks aswell.


Explain how errors will be handled.

The solution will use the same framework that is used by libdfs and dfuse for reporting errors. Hardlink correctness depends on the strict consistency between dentries and GIT. Upon noticing any inconsistency, it will report error.

Impacts

Any performance impact?

Refer to Performance Impact section above.


Any API changes? If so, internal or external API? Any changes required to middleware? Any interop requirements?

No external facing API is modified. There will be new api to support creating hardlink.


Any VOS/config layout changes? How will migration will be supported?

The Container layout version has to be changed to prevent older version of libdfs or tools like dfuse do not end up corrupting the metadata.


Any extra parameters required in the config file?

None


Any wire protocol change? How will interop be supported?

None.


Any impact on the rebuild protocol?

None.


Any impact on aggregation?

None.


Any impact on security?

None.

Quality

How the feature will be tested? Unit tests, functional tests and system tests need to be covered.
Describe the extra soak/performance tests that should be added.

Project Milestones

Description of the different milestones delivering incremental functionality.
Describe what will work/not work and what will be validated.
Targeted date for each milestone.

References (optional)

External papers, web page (if any)

Future Work (optional)

Known issues and future works that was considered out-of-scope.



Approvals

Reviewed & Approved by

Names

Date

Reviewed & Approved by

Names

Date

Feature Developers

Name(s)



Tech Lead/Architect

Name(s)



Test Engineers

Name(s)



Engineering Managers

Names(s)



Feature Test plan

Feature Test plan

  • Provide link to test plan for feature under development





Comments