PMem Q&A
“an ADR failure was detected, the pool might be corrupted”
an ADR failure was detected, the pool might be corrupted
is reported because two conditions are met simultaneously:
An Unsafe Shutdown took place AND
PMEMOBJ detected the pool was NOT closed properly.
If you see the “ADR failure…” error both things have happened.
1. Unsafe Shutdown
It is a situation in which hardware detected it was not able to flush all buffers to persistency. It is a hardware failure that requires further investigation before the hardware returns to normal use. Please ask the hardware provider for instructions.
2. PMEMOBJ pool is NOT closed properly
Whenever pmemobj_close()
is not called the pool is NOT closed properly. e.g. when the engine’s process is closed via SIGKILL which is at the moment the default behaviour.
Ref: https://github.com/daos-stack/daos/pull/15811
Unsafe Shutdown - in detail
This type of hardware failure is not transient; the faulty component is present and will likely cause issues in the future.
While PMem DIMMs are the easiest to replace, they may not necessarily be the faulty part.
As a result, routinely replacing DIMMs may not resolve the problem.
I do not know what tests you can conduct to identify the exact component at fault.
For list of involved hardware components please see ADR domain.
ADR/eADR domain
The following are logical diagrams and it may not fully or precisely list all hardware components.