Server Side Change:
Server side change will be implemented in DAOS engine Sched layer, since each xstream might have different workloads and RPC processing speed might be different depends on space pressure as well.
Basically there will be two limitations that prevent RPC from processing:
Number of ULTs in flights.
Number of RPCS in the per-pool waiting queue and global numbers of waiting queue.
When RPC arrives server, it might be put on waiting queue if number of in-flight exceed limit, and it might be rejected if number of waiting queue is full or RPC could be not handled timely based on current RPC processing speed and numbers of RPC in waiting queue.
A new error DER_BUSY will be returned to client, a hint will be returned to client.
number of RPCs queued will consider following factors:
reserved memory that DAOS might use, this could be different for different setup (MD-on-SSD or PMDK).
current xstream IO processing speed, it might change dynamically depends on space pressure thus each xstream might have different numbers of limit.
RPC might be rejected as well even waiting queue is not full, because smaller timeout of individual RPC.
In order to avoid tail-latency, a separate heap is introduced to insert retried RPC. whenever an RPC arrived server, it will get sorted ID, re-tried RPC will share same ID, server will always pick smaller ID from waiting queue.
Client changes:
RPC retry will be handled in the DAOS client side, DER_BUSY is a retry-able error, client shall re-schedule RPC with hint (0-timeout randomly) to resend RPC.
Protocal Change:
To support NRS, we might extend DAOS RPC to send/reply enough information:
struct daos_req_comm_in { uuid_t req_in_pool_id; uuid_t req_in_cont_id; uint32_t req_in_uid; uint32_t req_in_gid; uint32_t req_in_projid; uint64_t req_in_hint; /* for RPC reject */ uint64_t req_in_paddings[4]; crt_phy_addr_t req_in_addr; d_string_t req_in_jobid; }; struct daos_req_comm_out {. uint64_t req_out_hint; uint64_t req_out_paddings[4]; };
This will introduce interoperability issues for involved RPCs, to simplify this a bit, we might just consider object/dtx module.
v2 in/output struct will be introduced for these modules RPC format. modules will negotiate between client and server, then register proper RPC format handler.
In the server side ->dms_get_req_attr will be used extract NRS required attributes from different modules.