PMem Q&A

PMem Q&A

“an ADR failure was detected, the pool might be corrupted”

an ADR failure was detected, the pool might be corrupted is reported because two conditions are met simultaneously:

  1. An Unsafe Shutdown took place AND

  2. PMEMOBJ detected the pool was NOT closed properly.

If you see the “ADR failure…” error both things have happened.

1. Unsafe Shutdown

It is a situation in which hardware detected it was not able to flush all buffers to persistency. It is a hardware failure that requires further investigation before the hardware returns to normal use. Please ask the hardware provider for instructions.

2. PMEMOBJ pool is NOT closed properly

Whenever pmemobj_close() is not called the pool is NOT closed properly. e.g. when the engine’s process is closed via SIGKILL which is at the moment the default behaviour.

Ref: https://github.com/daos-stack/daos/pull/15811

Unsafe Shutdown - in detail

This type of hardware failure is not transient; the faulty component is present and will likely cause issues in the future.

While PMem DIMMs are the easiest to replace, they may not necessarily be the faulty part.

As a result, routinely replacing DIMMs may not resolve the problem.

I do not know what tests you can conduct to identify the exact component at fault.

For list of involved hardware components please see ADR domain.

ADR/eADR domain

The following are logical diagrams and it may not fully or precisely list all hardware components.

image (1).png
image (2).png

Ref: https://doi.org/10.14778/3436905.343692