CI Task Force Projects

This page tracks ideas for improving the stability and throughput of the DAOS CI system. This just supplements Jira for communication & progress reporting purposes.

All the individual improvement ideas are tracked in Jira epic DAOS-3929.

Ticket(s)	Task	Status	Result
DAOS-3914 CORCI-840	Increase throughput on HW clusters by running multiple servers per HW node and split one of the HW clusters.	blocked	Wait for new HW.
CORCI-841	Build new HW clusters and get them into CI as soon as HW shows up	blocked	HW ships 1/22
~~DAOS-3915~~	~~Analyze HW tests to see if they can be moved to weekly and to understand if different cluster combos would help~~	Done	Splitting clusters from 8 to 4 node will no make an immediate substantial impact. This is going to help long term though, particularly when adding the new hardware.
CORCI-717	Node failures are often the cause of intermittent CI issues but need console log to debug	todo
CORCI-842	PDT presentation on commit message pragmas, use of quickbuild	todo
DAOS-3868	Fix quickbuild issues, this will allow short-circuit of time consuming build operations when appropriate	todo
CORCI-711	HW nodes are provisioned from snapshot, fix out-of-space and other cruft issues that cause intermittent failures	in-progress
DAOS-3607 DAOS-2759	Increase flexibility to run different groupings of tests by running from RPMs	in-progress
CORCI-843	If a change impacts files in the "doc" directory then skip unnecessary build/test cycles.	todo
DAOS-3921	Reduce wait time in daos_test rebuild subtests	in-progress
~~DAOS-3919~~	~~Reduce IorSmall runtimes by eliminating uninteresting mux combintations and unnecessary formatting~~	Done	~50 minute reduction?
DAOS-3930	IorSmall runtimes are all over the map	todo
~~CORCI-831~~	~~Old PRs can use wrong packages~~	Done	Reduced intermittent failures
~~DAOS-3840~~	~~Use OPA adapters not ethernet in CI~~	Done	Modest improvement, but not what expected.

The primary metric for tracking improvements is total hours master jobs take to complete. Total time includes waiting time plus run time. The chart below includes randomly select successful runs against master.