...
Number of ULTs in flights. (calculated by per target memory limit / 16k)
Number of RPCS in the per-pool waiting queue and global numbers of waiting queue.
When RPC arrives server, it might be put on waiting queue if number of in-flight exceed limit, and it might be rejected if number of waiting queue is full or RPC could be not handle handled timely based on current RPC processing speed and numbers of RPC in waiting queue.
...
Info |
---|
|
In order to avoid tail-latency, a separate heap is introduced to insert retried RPC. whenever an RPC arrived server, it will get sorted ID, re-tried RPC will share same ID, server will always pick smaller ID from waiting queue.
...
RPC retry will be handled in the DAOS client side, DER_BUSY is a retry-able error, client shall re-schedule RPC with hint (0-hint timeout randomly) to resend RPC.
Cart change
high 16 bits of cch_dst_tag will be used for reply hint, a new flag CRT_RPC_FLAG_REJECT will be introduced for interoperability purpose. server will only return hint to client if REJECT flag set on RPC.
...
Protocal Change:
To support NRS, we might extend DAOS RPC to send/reply enough information:
Code Block |
---|
struct daos_req_comm_in {
uuid_t req_in_pool_id;
uuid_t req_in_cont_id;
uint32_t req_in_uid;
uint32_t req_in_gid;
uint32_t req_in_projid;
uint64_t req_in_hint; /* for RPC reject */
uint64_t req_in_paddings[4];
crt_phy_addr_t req_in_addr;
d_string_t req_in_jobid;
};
struct daos_req_comm_out {.
uint64_t req_out_hint;
uint64_t req_out_paddings[4];
}; |
This will introduce interoperability issues for involved RPCs, to simplify this a bit, we might just consider object/dtx module.
v2 in/output struct will be introduced for these modules RPC format. modules will negotiate between client and server, then register proper RPC format handler.
In the server side ->dms_get_req_attr will be used extract NRS required attributes from different modules.