Various test runs (e.g. the weekly test run, or master test runs after a merge) are not directly associated with user activity and therefore and no clear responsible party when something goes wrong. With no
clear owner problems that occur on these test runs that do not occur elsewhere (e.g. on a PR run with a clear owner) can go undetected and uninvestigated for too long. To combat this problem a member
of the test team is assigned as owner on a rotating basis. Responsibilities are as follows:
- Review test run output on a daily basis, any failed runs require follow up.
- Investigate test failures. If a test failure has an existing ticket make sure it is properly identified as an "Intermittent Test Issue". If there is not an existing ticket, create one. Use the template below when creating the ticket.
- Review CI test issue tickets using the pre-defined Jira query "Intermittent Test Issues".
- Bring failures that do not receive prompt action to the attention of management and/or the daos triage team.
- Tickets that have been open for 30 days or more and have "Number of Occurrences" set to 1 are candidates for closure. Follow up with assignee and/or triage group team to reevaluate.
- Present a short, reoccurring segment at the Monday test meeting on CI triage status.
Non-PR Testing Rotating Owners
Rotation begins WW30'20. The rotation term is 2 weeks. Owners are responsible for finding or exchanging with an alternate if their rotation falls during a vacation or other OOO situation.
Owner |
---|
Sylvia |
Saurabh |
Ding |
Use the template below for creating a new ticket for tests failing in CI.
Project: <CaRT | "CORAL - CI" | DAOS>
Issue type: Bug
Labels: Intermittent Test Issues
Fill in Bug Exposure, Bug Quality, Bug Type accordingly.
Summary: Provide a summary of the issue
Description: Please include the following in the description.
Failed Stage: < Build | Unit Test | Test>
Failed Build/Test:
For build failure, please list the build stage that failed e.g. "Build RPM on CentOS 7", "Build RPM on Leap 15", etc
For test failure, please list the test folder, source (and variant when possible) e.g. daos_test/daos_core_test.py - DAOS degraded-mode tests
Branch: <master | name_of_branch>
Commit: <commit hash>
Include stack trace, error message
Attach debug logs.
Intermittent CI failures
Here is a quick summary of recent CI failures.
For latest update on ticket status, full list of intermittent failures and details on failures, please refer to ticket in JIRA using filter "Intermittent Test Issues"
Test | JIRA ticket | Error |
---|---|---|
Daos Core Test degraded mode | Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration. | Test died without reporting the status. Runner error occurred: Timeout reached |
Daos Core Test Rebuild 25 |
| Test died without reporting the status. Runner error occurred: Timeout reached |
SCMConfigTest.test_scm_in_use_basic |
| Test died without reporting the status. Runner error occurred: Timeout reached |
mdtest small |
| Test was expected to pass but it failed. |
OSA online reintegration |
| Test died without reporting the status. Runner error occurred: Timeout reached Stderr: ior ERROR (aiori-DAOS.c:441): 1: -1007: daos_array_write() failed (-1007). application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1 |
DAOS degraded-mode tests.DEGRADED2 | Stacktrace0xfffffffffffffc04 != 0 src/tests/suite/daos_obj.c:1233: error: Failure! |