Fixed
Details
Details
Assignee
Yawei Niu
Yawei NiuReporter
Michael Hennecke
Michael HenneckePriority
Affects versions
Fix versions
Labels
Components
Story Points
8
Bug Exposure
3-Medium
Bug Source
Product Bug
Number of Occurrences
1
Created October 31, 2021 at 1:56 PM
Updated February 9, 2023 at 6:19 PM
Resolved February 9, 2023 at 6:19 PM
DAOS 2.2 requirement is to be able to sustain multiple iterations of IOR-easy without hitting ENOSPACE, where each iteration writes >50% of the total pool capacity (e.g., 80% of the total pool capacity).
Original issue on 1.3 (partially addressed in 2.0):
On a DAOS server with ~25TB of NVMe space, an IOR-easy job that writes ~6.3TB per iteration fails in the 4th iteration, with
This should not happen, as the files of the previous iterations have been deleted and space should be reclaimed as the pool fills up. Even worse, after the job abort the "daos cont destroy" that runs in the Slurm epilog is so slow that the epilog itself times out.
This is on daos-1.3.106-3.7265.g74926ea0.el7.x86_64 (clients daos-1.3.106-3.7265.g74926ea0.el8.x86_64)