Option to change redundancy level from engine to server

Description

When selecting the storage engines to participate in a redundancy group, the default in DAOS 2.0 is to treat all engines as independent. This is reflected by a redundancy level (rf_lvl) property value of "rank (1)" on the container level.

For DAOS servers with multiple engines per server, it should be possible to select the server level as the redundancy level: The server is a single point of failure, and the engines that reside on the same server should not be part of the same redundancy group (except for testing purposes, for which the engine level is still very useful).

With DAOS 2.0.2, rf_lvl is not a settable property. The daos cont create command should be extended to support the setting of rf_lvl to either "1 (engine)" or "2 (server)", and the placement algorithm should interpret the "2 (server)" setting appropriately. See

There should also be tests that validate the server-level redundancy: We should validate that with EC 4+2P or 8+2P (or 3way replication), two servers can be powered off without losing access to the data, while an I/O intensive application like IOR-easy is running.

Linked work items

relates to

DAOS-12571

Make rd_lvl a settable property at the pool level

Activity

Xuezhao Liu August 12, 2022 at 1:43 AM

Patch landed, closing the ticket.

thanks for the testing, if find bug later please feel free to report.

Makito Kano July 26, 2022 at 1:28 PM

Thanks for the info. (I remember the rebuild result doesn’t show the redundancy capability.) I tested the following with 3 nodes with 2 ranks/node.

rf_lvl:2,rf:1 - Used RP_2G1

Bring down two consecutive ranks, 4 and 5, which are in the same node (wolf-121): Container Health remains HEALTHY.
Bring down two ranks in two different nodes, 3 (wolf-120) and 5 (wolf-121): Container Health becomes UNCLEAN.

rf_lvl:1,rf:1 - Used RP_2GX

Bring down two consecutive ranks, 4 and 5, which are in the same node (wolf-121): Container Health becomes UNCLEAN.

It looks like the feature is working as expected.

I’ll try RP_2GX with your patch later. Thanks.

Xuezhao Liu July 25, 2022 at 1:38 PM
Edited

for the problem of “I get an error if I try to use RP_2GX with rf_lvl:2” could you please test if the problem still occurs with this WIP patch?

“I stopped ranks from two different nodes, which exceeds the rf:1 limit based on your explanation”

yes it should break the RF:1, for this case should be able to see RF broken err msg in the log? and the opened container should not be able to write/read after that right?

But for the rebuilding of the RP_2G1 obj, it does not mean it has to fail, because as long as there is one replica remaining and if the rebuild can find alive spare target the rebuild can work (rebuild’s IO is allowed even when RF broken).

Just a note again that before resolved, current pool map with the info of rank0-1 belong to NODE0, rank2-3 belong to NODE1. So current test should also based on this assumption.

Makito Kano July 25, 2022 at 12:59 PM

Thanks for the explanation. I’m doing some quick testing now and have some questions.

I’m using 3 nodes with 2 engines/node. I’m creating a container with rf_lvl:2,rf:1 across the 6 ranks. Then I run IOR with RP_2G1 oclass. At this point, if I stop three ranks where two of them are in the same node with dmg system stop, then the subsequent rebuild will succeed (will be done). I stopped ranks from two different nodes, which exceeds the rf:1 limit based on your explanation. Is this expected?

Also, I get an error if I try to use RP_2GX with rf_lvl:2. I see this with some other X oclasses too.

Xuezhao Liu July 25, 2022 at 1:27 AM
Edited

if the container is created with rf_lvl:2 (NODE), then with in one NODE if any engine/vos_target (despite of one or more) failed it will be treated as one NODE failure.

If the case you mentioned, stop one rank from wolf-1 and one rank from wolf-2, that will cause both NODEs failed. So for the container created with RF:1 it will report RF broken as #node_failure > rf, and for the container created with RF:2 will not cause RF broken.

But a related problem is now the topology of which engine belong to which NODE/server was not parsed/passed correctly by control plane, it need to be addressed by , before that being addressed, currently the pool map with info of engine 0:1 belong to NODE/server_0, and engine 2:3 belong to NODE/server_1 and so on, that possibly does not match with real config. So based on current master’s behavior, if stop rank2 and rank3 it will be treated as ONE NODE failure (if the container created with rf_lvl:2) although possibly in real configuration the rank2 and rank 3 possibly be on different NODEs.

Resize issue view side panel

Fixed

Details

Assignee

Xuezhao Liu

Reporter

Michael Hennecke

Priority

P2-High

Labels

Components

Patch URL

Story Points

Bug Exposure

3-Medium

Bug Source

Product Bug

Number of Occurrences

Created April 1, 2022 at 7:50 PM

Updated February 6, 2023 at 7:39 PM

Resolved June 21, 2022 at 2:37 AM