Feature requirements

DAOS should support rolling upgrade to ensure continuous availability and seamless user experience, by upgrading storage engines incrementally rather than all at once which requires downtime of the entire storage cluster. Rolling upgrade also aligns with the distributed architecture’s fault-tolerant design, if an issue arises during the upgrade, only a subset of nodes is affected, and the remaining storage engines can still provide I/O service without disruption.

Scope statements

These scenarios should be supported by DAOS:

Definitions

Engine version

The hard coded version of engine software, it can be different with the runtime version of DAOS. DAOS engine should support protocols of both engine version and runtime version (system version), which should be consecutive on main version number.

System version

The runtime version of DAOS storage system, it is persistently stored in a database of management service. When a DAOS engine joins DAOS system, it can get the system version from management service and run with protocols matching with this version. It should be noted that this database is only created by the mgmt service after initiating a rolling upgrade, and will be deleted upon either completing or aborting the upgrade process.
In the progress of rolling upgrade, there are two system version numbers, one is the current runtime version or the legacy version, the other is the next version for rolling upgrade, which is inactive until completion of update.

Pool version

It is the durable format version of DAOS pool, which may or may not be changed between different system versions. Durable format upgrade is not part of rolling upgrade, administrator can decide whether to upgrade durable format for each pool after completion of rolling upgrade. Durable format upgrade is already supported by DAOS, it will not be included in this design.

Administrator Interfaces

Administrator should explicitly start rolling upgrade by executing a new DMG command “upgrade”, which can notify management system begin of the upgrading process, otherwise any engine with different version with system version will fail to join the storage cluster; administrator should also explicitly complete or abort rolling upgrade, to avoid unambiguous status and compatibility risks.

After starting of rolling upgrade, administrator can shut down certain number of DAOS engines, upgrade RPMs for those engines then bring them back. Those engines are allowed to join DAOS cluster and run with the current system version, instead of the engine version.

Dmg system “upgrade” command

Upgrades are non-concurrent operations. Once initiated by an administrator, no additional rolling upgrade can be started until the current process either completes or is explicitly aborted. The following sub-commands of “upgrade” might be introduced

Prepare

Administrator should initiate the upgrade process using sub-command “prepare”, which can notify MGMT service to maintain global status of upgrade and manage version for each engine. This sub-command has a few parameters:

 

Enable

After initiating a rolling upgrade using “dmg upgrade prepare”, administrator can specify a set of engines for upgrade by executing “dmg upgrade enable –ranks=…”, where “ranks” accepts expressions to identify engine IDs. The designated engines must then be shut down for RPM updates. Importantly, engines not explicitly selected for upgrade should be unable to rejoin the storage cluster if updated accidentally. This explicit mechanism prevents operational errors that could compromise data safety.

Commit

During a rolling upgrade, after all the data engines have been upgraded to the target version, the administrator runs the command “dmg system upgrade commit” to finalize the rolling upgrade. If all the engines in the cluster have indeed been upgraded to the initially specified version, the cluster will complete the rolling upgrade and switch to the new version protocol for future I/O services.

Abort

If a failure occurs during the rolling upgrade, the administrator can abort the upgrading process by running “dmg system upgrade abort”. Before executing this command, the administrator should downgrade the RPMs of the upgraded data engines while ensuring service continuity. Finally, the abort command can be executed to clear the rolling upgrade metadata maintained by the mgmt service.

Query

During the rolling upgrade, the administrator can use the query command to monitor the upgrade status, which includes the number of engines on the new/old versions and their version details.

RPM update

RPM updates can only be performed on engines that have been explicitly designated by the administrator via DMG commands. Otherwise, the updated engines will be unable to join the storage cluster.

Design details

The rolling upgrade requires modifications in several aspects:

Metadata store for rolling upgrade

The management service establishes a new RAFT-based metadata store to maintain cluster-wide rolling upgrade state, enabling upgrade continuity during node failures while ensuring data persistence and system reliability. This metadata store includes this information:

As previously mentioned, the legacy version is still functionally the current version during the rolling upgrade, version switching only occurs after full cluster upgrading to the next version. The diagram below illustrates the architecture described above.

Rolling_upgrade.png

Additionally, each engine also manages metadata including:

Upon successful upgrade of all engines and administrator confirmation via 'dmg system upgrade commit', the mgmt. service will change the current system version from the legacy version to the next version and enforce strict version homogeneity by prohibiting mixed-version engines from joining the storage cluster thereafter. During the commit, system version updates are atomic and idempotent. Post-commit validation ensures all ranks use the new version protocols by default.
If an engine attempts RPC communication using an outdated version due to being out-of-sync, such requests will be rejected. The affected engine must then complete version switch before retrying operations.

RPC protocol version

The selection of RPC protocol versions varies across different upgrade scenarios.

In the former scenario, no protocol negotiation modifications are required. The implementation only needs to validate the network stack's capability to re-establish connections across all supported network types after server restarts/upgrades, then ensure resumed communication with automatic resend of pending RPCs.

In the second scenario, the DAOS client needs to register both legacy and new RPC formats. During the upgrade process, the client continues using the legacy version as the system version, which is cached in the connection handle, for server communication. After the servers complete their rolling upgrade, they can loosely notify clients version change by setting version in the common header of RPC replies. When the client detects version change in the response, if the new system version matches what the client supports, it initiates refresh RPCs to request additional metadata about the new version for pool and container connections. Finally, it switches to the new I/O protocol to ensure compatibility and unlock all new features. The whole process happens automatically, the client keeps working with the old version until everything's ready, then seamlessly transitions to the new one without any service interruption.

DAOS Agent Upgrade