Server Side Change:
Server side change will be implemented in DAOS engine Sched layer, since each xstream might have different workloads and RPC processing speed might be different depends on space pressure as well.
Basically there will be two limitations that prevent RPC from processing:
Number of ULTs in flights.
Number of RPCS in the per-pool waiting queue and global numbers of waiting queue.
When RPC arrives server, it might be put on waiting queue if number of in-flight exceed limit, and it might be rejected if number of waiting queue is full or RPC could not handle timely based on current RPC processing speed and numbers of RPC in waiting queue.
A new error DER_BUSY will be returned to client, a hint will be returned to client.
number of RPCs queued will consider following factors:
reserved memory that DAOS might use, this could be different for different setup (MD-on-SSD or PMDK).
current xstream IO processing speed, it might change dynamically depends on space pressure thus each xstream might have different numbers of limit.
RPC might be rejected as well even waiting queue is not full, because smaller timeout of individual RPC.
In order to avoid tail-latency, a separate heap is introduced to insert retried RPC. whenever an RPC arrived server, it will get sorted ID, re-tried RPC will share same ID, server will always pick smaller ID from waiting queue.
Client changes:
RPC retry will be handled in the DAOS client side, DER_BUSY is a retry-able error, client shall re-schedule RPC with hint (0-hint randomly) to resend RPC.
Cart change
high 16 bits of cch_dst_tag will be used for reply hint, a new flag CRT_RPC_FLAG_REJECT will be introduced for interoperability purpose. server will only return hint to client if REJECT flag set on RPC.
and DER_TIMEOUT will return to client so old clients could work with newer servers.(retry without hints)