PMem Q&A
“an ADR failure was detected, the pool might be corrupted”
an ADR failure was detected, the pool might be corrupted is reported because two conditions are met simultaneously:
An Unsafe Shutdown took place AND
PMEMOBJ detected the pool was NOT closed properly.
If you see the “ADR failure…” error both things have happened.
1. Unsafe Shutdown
It is a situation in which hardware detected it was not able to flush all buffers to persistency. It is a hardware failure that requires further investigation before the hardware returns to normal use. Please ask the hardware provider for instructions.
2. PMEMOBJ pool is NOT closed properly
Whenever pmemobj_close() is not called the pool is NOT closed properly. e.g. when the engine’s process is closed via SIGKILL which is at the moment the default behaviour.
Ref: https://github.com/daos-stack/daos/pull/15811
Unsafe Shutdown - in detail
This type of hardware failure is not transient; the faulty component is present and will likely cause issues in the future.
While PMem DIMMs are the easiest to replace, they may not necessarily be the faulty part.
As a result, routinely replacing DIMMs may not resolve the problem.
I do not know what tests you can conduct to identify the exact component at fault.
For list of involved hardware components please see ADR domain.
The following are logical diagrams and it may not fully or precisely list all hardware components.