Various test runs (e.g. the weekly test run, or master test runs after a merge) are not directly associated with user activity and therefore and no clear responsible party when something goes wrong. With no
clear owner problems that occur on these test runs that do not occur elsewhere (e.g. on a PR run with a clear owner) can go undetected and uninvestigated for too long. To combat this problem a member
of the test team is assigned as owner on a rotating basis. Responsibilities are as follows:
- Review test run output on a daily basis, any failed runs require follow up.
- Investigate test failures. If a test failure has an existing ticket make sure it is properly identified as an "Intermittent Test Issue". If there is not an existing ticket, create one. Use the template below when creating the ticket.
- Review CI test issue tickets using the pre-defined Jira query "Intermittent Test Issues".
- Bring failures that do not receive prompt action to the attention of management and/or the daos triage team.
- Tickets that have been open for 30 days or more and have "Number of Occurrences" set to 1 are candidates for closure. Follow up with assignee and/or triage group team to reevaluate.
Non-PR Testing Rotating Owners
Rotation begins WW30'20. The rotation term is 2 weeks. Owners are responsible for finding or exchanging with an alternate if their rotation falls during a vacation or other OOO situation.
Owner |
---|
Sylvia |
Saurabh |
Ding |
Use the template below for creating a new ticket for tests failing in CI.
Project: <CaRT | "CORAL - CI" | DAOS>
Issue type: Bug
Labels: flaky
Fill in Bug Exposure, Bug Quality, Bug Type accordingly.
Summary: Provide a summary of the issue
Description: Please include the following in the description.
Failed Stage: < Build | Unit Test | Test>
Failed Build/Test:
For build failure, please list the build stage that failed e.g. "Build RPM on CentOS 7", "Build RPM on Leap 15", etc
For test failure, please list the test folder, source (and variant when possible) e.g. daos_test/daos_core_test.py - DAOS degraded-mode tests
Branch: <master | name_of_branch>
Commit: <commit hash>
Include stack trace, error message
Attach debug logs.
Intermittent CI failures
Here is a summary of CI failures.
For latest update on ticket status and details on failures, please refer to ticket in JIRA using filter "Intermittent Test Issues"
Test | JIRA ticket | Error | |
---|---|---|---|
Daos Core Test Rebuild 26 | Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration. | Test died without reporting the status. Runner error occurred: Timeout reached #7 0x00007fbdd409a0e2 in HG_Progress (context=context@entry=0x7fbd300480f0, timeout=<optimized out>) at /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/build/external/dev/mercury/src/mercury.c:1979 ret = HG_SUCCESS __func__ = "HG_Progress" #8 0x00007fbdd65c28a1 in crt_hg_progress (hg_ctx=hg_ctx@entry=0x7fbd30042df8, timeout=timeout@entry=0) at src/cart/src/cart/crt_hg.c:1350 hg_ret = HG_SUCCESS rc = 0 count = 0 hg_context = 0x7fbd300480f0 hg_timeout = <optimized out> total = 256 __func__ = "crt_hg_progress" #9 0x00007fbdd65836c5 in crt_progress (crt_ctx=0x7fbd30042de0, timeout=timeout@entry=0) at src/cart/src/cart/crt_context.c:1330 ctx = 0x7fbd30042de0 rc = 0 __func__ = "crt_progress" #10 0x000000000041bb5f in dss_srv_handler (arg=0x25d0890) at src/iosrv/srv.c:562 dx = 0x25d0890 dtc = 0x7fbd30000910 rc = -1011 signal_caller = false __func__ = "dss_srv_handler" __PRETTY_FUNCTION__ = "dss_srv_handler" #11 0x00007fbdd557417b in ABTD_thread_func_wrapper_thread () from /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/install/bin/../prereq/dev/argobots/lib/libabt.so.0 No symbol table info available. #12 0x00007fbdd5574851 in make_fcontext () from /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/install/bin/../prereq/dev/argobots/lib/libabt.so.0 No symbol table info available. #13 0x0000000000000000 in ?? () No symbol table info available. | |
Daos Core Test Rebuild 26 |
| Test died without reporting the status. Runner error occurred: Timeout reached #12 0x00007f6f5cede401 in crt_hg_progress (hg_ctx=hg_ctx@entry=0x7f6f08042df8, timeout=timeout@entry=0) at src/cart/src/cart/crt_hg.c:1302 hg_ret = HG_SUCCESS rc = 0 count = 0 hg_context = 0x7f6f080480f0 hg_timeout = <optimized out> total = 256 __func__ = "crt_hg_progress" #13 0x00007f6f5ce9f695 in crt_progress (crt_ctx=0x7f6f08042de0, timeout=timeout@entry=0) at src/cart/src/cart/crt_context.c:1330 ctx = 0x7f6f08042de0 rc = 0 __func__ = "crt_progress" #14 0x000000000041bb5f in dss_srv_handler (arg=0x28f3e50) at src/iosrv/srv.c:562 dx = 0x28f3e50 dtc = 0x7f6f08000910 rc = -1011 signal_caller = false __func__ = "dss_srv_handler" __PRETTY_FUNCTION__ = "dss_srv_handler" #15 0x00007f6f5be9017b in ABTD_thread_func_wrapper_thread () from /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/install/bin/../prereq/dev/argobots/lib/libabt.so.0 No symbol table info available. #16 0x00007f6f5be90851 in make_fcontext () from /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/install/bin/../prereq/dev/argobots/lib/libabt.so.0 No symbol table info available. #17 0x0000000000000000 in ?? () No symbol table info available. | |