Multi-operation VOS transactions

RDB operations are more rare than I/O operations so with PMDK, the overhead of multiple PMDK transactions to update RDB was considered acceptable. For MD on SSD, however, it will result in many tiny transactions, each updating the WAL independently. Support for writing multiple operations to VOS in a single VOS transaction is preferable.

Presently, VOS supports multiple operation transactions with compound RPC DTX. Using DTX is preferable because it already handles any yield on commit appropriately. In such cases, the dtx handle is initialized with the known modification count. Each update will then update the sequence number and the final commit happens when the sequence number matches the operation count. There are some pros and cons to this approach:


  1. Since we know ahead of time how many operations are incoming, we can allocate enough space to store any reservations and deferred operations so we can cancel/publish/execute them. In the event of using a transaction API, we would need to guess and reallocate as needed.


  1. We need to know ahead of time how many punch or update operations we will perform in the transaction and we need to ensure we increment the sequence number between operations

  2. Triggering commit based on this count makes the code more complex

  3. As currently implemented, the dtx handle stores information about the reservations and deferrals and this really should not be exported at all to dtx layer.

  4. Presently, the protocol serializes data commit for all but the last operation for MD on SSD though this could be fixed without changing the fundamental method.

If RDB can know ahead of time how many update/punch operations it will place in a single transaction, we should probably add multi-operation RDB transactions using a phased approach.

Phase 1 - Use DTX as is to support multi-operation RDB transactions

With very little change to no change in VOS, we should be able to execute such transactions using the existing DTX interfaces while also setting the SOLO && DROP_CMT flags to avoid unnecessary aggregation requirements that are really reserved for phase 2 commit or distributed transactions. @Fan Yong can help with the usage of DTX APIs.

Phase 2 - Add a native VOS api for transaction begin/end

There are a few obstacles that need some rework for this to happen:

  1. The storage for deferred operations and reserved extents need to be moved to VOS, probably to a generic VOS transaction handle. The dtx handle should probably be assocated with this VOS handle. Some of this can be shared between punch and update APIs.

  2. Optionally, but importantly, we should support more than just the final data blob update being committed in parallel with the VOS transaction. @Yawei Niu can help with the design of this part. Presently, a biod is passed to tx_end which is interpreted as a biod. What we probably would need is to pass a list or an array of such instead.

  3. I believe vos aggregation also has some usage that needs to be considered in any API we create.

Additionally, we will need to make a decision about when the API is used. There are two options

  1. For any update to VOS, it is required to use this transaction API. This is a much bigger change and will require changing various in DAOS as well as all of the VOS unit tests.

  2. Use these APIs as a wrapper for existing VOS operations and keep track of whether we are in such a local transaction.

We may take a phased approach here and implement the 2nd option first while considering a minor redesign of VOS handling of transactions.

One rather large issue with the existing DTX support in VOS is that we use TLS to save the DTX handle. This was primarily done to avoid passing the DTX handle to various routines and callbacks. However, it’s is error prone in the event of ULT yield and should be reworked. I would suggest we consider reworking these internal APIs as part of Phase 2.

Tickets associated with these tasks