Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Updated: CI_Weekly_Master_failures.xlsx

CI Status_WW39.pptx


Various test runs (e.g. the weekly test run, or master test runs after a merge) are not directly associated with user activity and therefore and no clear responsible party when something goes wrong.  With no

...

  • Review test run output on a daily basis, any failed runs require follow up.
  • Investigate test failures.  If a test failure has an existing ticket make sure it is properly identified as a an "CI Intermittent Test Issue".  If there is not an existing ticket, create one.  Use the template below when creating the ticket.
  • Review CI test issue tickets using the pre-defined Jira query "CI Intermittent Test Issues". 
    • Bring failures that do not receive prompt action to the attention of management and/or the daos triage team. 
    • Tickets that have been open for 30 days or more and have "Number of Occurrences" set to 1 are candidates for closure.  Follow up with assignee and/or triage group team to reevaluate.
    • Present a short, reoccurring segment at the Monday test meeting on CI triage status.

Non-PR Testing Rotating Owners

Rotation begins WW30'20.  The rotation term is 2 weeks.  Owners are responsible for finding or exchanging with an alternate if their rotation falls during a vacation or other OOO situation.

Owner
Sylvia
Saurabh
Ding


For latest update on ticket status, full list of intermittent failures and details on failures, please refer to ticket in JIRA using filter "Intermittent Test Issues"


Use the template below for creating a new ticket for tests failing in CI.

Project: <CaRT | "CORAL - CI" | DAOS>

Issue type: Bug

Labels: flaky Intermittent Test Issues (for PR/master failures) or weekly_failures (for failures on weekly-testing branch)

Fill in Bug Exposure, Bug Quality, Bug Type accordingly.

Summary: Provide a summary of the issue (if it is an issue on weekly, precede with "Weekly Test - <summary>"

Description: Please include the following in the description.

...

Include stack trace, error message

Attach debug logs.

Flaky CI failures

Here is a summary of CI failures.

For latest update on ticket status and details on failures, please refer to ticket in JIRA using filter "CI issues"

...

Jira Legacy
serverHPDD Community Jira
serverId8bba2dd1-4333-3006-bfcd-f35d4ebbd2ad
keyDAOS-5296

...

Test died without reporting the status. Runner error occurred: Timeout reached
Test should have produced a coredump
#7  0x00007fbdd409a0e2 in HG_Progress (context=context@entry=0x7fbd300480f0, timeout=<optimized out>) at /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/build/external/dev/mercury/src/mercury.c:1979
        ret = HG_SUCCESS
        __func__ = "HG_Progress"
#8  0x00007fbdd65c28a1 in crt_hg_progress (hg_ctx=hg_ctx@entry=0x7fbd30042df8, timeout=timeout@entry=0) at src/cart/src/cart/crt_hg.c:1350
        hg_ret = HG_SUCCESS
        rc = 0
        count = 0
        hg_context = 0x7fbd300480f0
        hg_timeout = <optimized out>
        total = 256
        __func__ = "crt_hg_progress"
#9  0x00007fbdd65836c5 in crt_progress (crt_ctx=0x7fbd30042de0, timeout=timeout@entry=0) at src/cart/src/cart/crt_context.c:1330
        ctx = 0x7fbd30042de0
        rc = 0
        __func__ = "crt_progress"
#10 0x000000000041bb5f in dss_srv_handler (arg=0x25d0890) at src/iosrv/srv.c:562
        dx = 0x25d0890
        dtc = 0x7fbd30000910
        rc = -1011
        signal_caller = false
        __func__ = "dss_srv_handler"
        __PRETTY_FUNCTION__ = "dss_srv_handler"
#11 0x00007fbdd557417b in ABTD_thread_func_wrapper_thread () from /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/install/bin/../prereq/dev/argobots/lib/libabt.so.0
No symbol table info available.
#12 0x00007fbdd5574851 in make_fcontext () from /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/install/bin/../prereq/dev/argobots/lib/libabt.so.0
No symbol table info available.
#13 0x0000000000000000 in ?? ()
No symbol table info available.

...

Jira Legacy
serverHPDD Community Jira
serverId8bba2dd1-4333-3006-bfcd-f35d4ebbd2ad
keyDAOS-5295

Test died without reporting the status. Runner error occurred: Timeout reached
Test should have produced a coredump

...

.