Updated: CI_Weekly_Master_failures.xlsx
Various test runs (e.g. the weekly test run, or master test runs after a merge) are not directly associated with user activity and therefore and no clear responsible party when something goes wrong. With no
...
- Review test run output on a daily basis, any failed runs require follow up.
- Investigate test failures. If a test failure has an existing ticket make sure it is properly identified as an "Intermittent Test Issue". If there is not an existing ticket, create one. Use the template below when creating the ticket.
- Review CI test issue tickets using the pre-defined Jira query "Intermittent Test Issues".
- Bring failures that do not receive prompt action to the attention of management and/or the daos triage team.
- Tickets that have been open for 30 days or more and have "Number of Occurrences" set to 1 are candidates for closure. Follow up with assignee and/or triage group team to reevaluate.
- Present a short, reoccurring segment at the Monday test meeting on CI triage status.
Non-PR Testing Rotating Owners
Rotation begins WW30'20. The rotation term is 2 weeks. Owners are responsible for finding or exchanging with an alternate if their rotation falls during a vacation or other OOO situation.
Owner |
---|
Sylvia |
Saurabh |
Ding |
For latest update on ticket status, full list of intermittent failures and details on failures, please refer to ticket in JIRA using filter "Intermittent Test Issues"
Use the template below for creating a new ticket for tests failing in CI.
Project: <CaRT | "CORAL - CI" | DAOS>
Issue type: Bug
Labels: flaky Intermittent Test Issues (for PR/master failures) or weekly_failures (for failures on weekly-testing branch)
Fill in Bug Exposure, Bug Quality, Bug Type accordingly.
Summary: Provide a summary of the issue (if it is an issue on weekly, precede with "Weekly Test - <summary>"
Description: Please include the following in the description.
...
Include stack trace, error message
Attach debug logs.
Intermittent CI failures
Here is a summary of CI failures.
For latest update on ticket status and details on failures, please refer to ticket in JIRA using filter "Intermittent Test Issues"
...
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
...
Test died without reporting the status. Runner error occurred: Timeout reached
Test should have produced a coredump
#7 0x00007fbdd409a0e2 in HG_Progress (context=context@entry=0x7fbd300480f0, timeout=<optimized out>) at /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/build/external/dev/mercury/src/mercury.c:1979
ret = HG_SUCCESS
__func__ = "HG_Progress"
#8 0x00007fbdd65c28a1 in crt_hg_progress (hg_ctx=hg_ctx@entry=0x7fbd30042df8, timeout=timeout@entry=0) at src/cart/src/cart/crt_hg.c:1350
hg_ret = HG_SUCCESS
rc = 0
count = 0
hg_context = 0x7fbd300480f0
hg_timeout = <optimized out>
total = 256
__func__ = "crt_hg_progress"
#9 0x00007fbdd65836c5 in crt_progress (crt_ctx=0x7fbd30042de0, timeout=timeout@entry=0) at src/cart/src/cart/crt_context.c:1330
ctx = 0x7fbd30042de0
rc = 0
__func__ = "crt_progress"
#10 0x000000000041bb5f in dss_srv_handler (arg=0x25d0890) at src/iosrv/srv.c:562
dx = 0x25d0890
dtc = 0x7fbd30000910
rc = -1011
signal_caller = false
__func__ = "dss_srv_handler"
__PRETTY_FUNCTION__ = "dss_srv_handler"
#11 0x00007fbdd557417b in ABTD_thread_func_wrapper_thread () from /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/install/bin/../prereq/dev/argobots/lib/libabt.so.0
No symbol table info available.
#12 0x00007fbdd5574851 in make_fcontext () from /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/install/bin/../prereq/dev/argobots/lib/libabt.so.0
No symbol table info available.
#13 0x0000000000000000 in ?? ()
No symbol table info available.
...
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Test died without reporting the status. Runner error occurred: Timeout reached
Test should have produced a coredump
...
.