Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Various test runs (e.g. the weekly test run, or master test runs after a merge) are not directly associated with user activity and therefore and no clear responsible party when something goes wrong.  With no

clear owner problems that occur on these test runs that do not occur elsewhere (e.g. on a PR run with a clear owner) can go undetected and uninvestigated for too long.  To combat this problem a member 

of the test team is assigned as owner on a rotating basis.  Responsibilities are as follows:

  • Review test run output on a daily basis, any failed runs require follow up.
  • Investigate test failures.  If a test failure has an existing ticket make sure it is properly identified as a "CI Issue".  If there is not an existing ticket, create one.  Use the template below when creating the ticket.
  • Review CI test issue tickets using the pre-defined Jira query "CI Issues". 
    • Bring failures that do not receive prompt action to the attention of management and/or the daos triage team. 
    • Tickets that have been open for 30 days or more and have "Number of Occurrences" set to 1 are candidates for closure.  Follow up with assignee and/or triage group team to reevaluate.

Non-PR Testing Rotating Owners

Rotation begins WW30'20.  The rotation term is 2 weeks.  Owners are responsible for finding or exchanging with an alternate if their rotation falls during a vacation or other OOO situation.

Owner
Sylvia
Saurabh
Ding


Use the template below for creating a new ticket for tests failing in CI.

Project: <CaRT | "CORAL - CI" | DAOS>

Issue type: Bug

Labels: flaky

Fill in Bug Exposure, Bug Quality, Bug Type accordingly.

Summary: Provide a summary of the issue

Description: Please include the following in the description.

Failed Stage: < Build | Unit Test | Test>

Failed Build/Test:

For build failure, please list the build stage that failed e.g. "Build RPM on CentOS 7", "Build RPM on Leap 15", etc

For test failure, please list the test folder, source (and variant when possible) e.g. daos_test/daos_core_test.py - DAOS degraded-mode tests

Branch: <master | name_of_branch>

Commit: <commit hash>

Include stack trace, error message

Attach debug logs.


Flaky CI failures

Here is a summary of CI failures.

For latest update on ticket status and details on failures, please refer to ticket in JIRA using filter "CI issues"

TestJIRA ticketError
Daos Core Test Rebuild 26

Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Test died without reporting the status. Runner error occurred: Timeout reached
Test should have produced a coredump
#7  0x00007fbdd409a0e2 in HG_Progress (context=context@entry=0x7fbd300480f0, timeout=<optimized out>) at /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/build/external/dev/mercury/src/mercury.c:1979
        ret = HG_SUCCESS
        __func__ = "HG_Progress"
#8  0x00007fbdd65c28a1 in crt_hg_progress (hg_ctx=hg_ctx@entry=0x7fbd30042df8, timeout=timeout@entry=0) at src/cart/src/cart/crt_hg.c:1350
        hg_ret = HG_SUCCESS
        rc = 0
        count = 0
        hg_context = 0x7fbd300480f0
        hg_timeout = <optimized out>
        total = 256
        __func__ = "crt_hg_progress"
#9  0x00007fbdd65836c5 in crt_progress (crt_ctx=0x7fbd30042de0, timeout=timeout@entry=0) at src/cart/src/cart/crt_context.c:1330
        ctx = 0x7fbd30042de0
        rc = 0
        __func__ = "crt_progress"
#10 0x000000000041bb5f in dss_srv_handler (arg=0x25d0890) at src/iosrv/srv.c:562
        dx = 0x25d0890
        dtc = 0x7fbd30000910
        rc = -1011
        signal_caller = false
        __func__ = "dss_srv_handler"
        __PRETTY_FUNCTION__ = "dss_srv_handler"
#11 0x00007fbdd557417b in ABTD_thread_func_wrapper_thread () from /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/install/bin/../prereq/dev/argobots/lib/libabt.so.0
No symbol table info available.
#12 0x00007fbdd5574851 in make_fcontext () from /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/install/bin/../prereq/dev/argobots/lib/libabt.so.0
No symbol table info available.
#13 0x0000000000000000 in ?? ()
No symbol table info available.

Daos Core Test Rebuild 26

Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Test died without reporting the status. Runner error occurred: Timeout reached
Test should have produced a coredump

#12 0x00007f6f5cede401 in crt_hg_progress (hg_ctx=hg_ctx@entry=0x7f6f08042df8, timeout=timeout@entry=0) at src/cart/src/cart/crt_hg.c:1302
        hg_ret = HG_SUCCESS
        rc = 0
        count = 0
        hg_context = 0x7f6f080480f0
        hg_timeout = <optimized out>
        total = 256
        __func__ = "crt_hg_progress"
#13 0x00007f6f5ce9f695 in crt_progress (crt_ctx=0x7f6f08042de0, timeout=timeout@entry=0) at src/cart/src/cart/crt_context.c:1330
        ctx = 0x7f6f08042de0
        rc = 0
        __func__ = "crt_progress"
#14 0x000000000041bb5f in dss_srv_handler (arg=0x28f3e50) at src/iosrv/srv.c:562
        dx = 0x28f3e50
        dtc = 0x7f6f08000910
        rc = -1011
        signal_caller = false
        __func__ = "dss_srv_handler"
        __PRETTY_FUNCTION__ = "dss_srv_handler"
#15 0x00007f6f5be9017b in ABTD_thread_func_wrapper_thread () from /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/install/bin/../prereq/dev/argobots/lib/libabt.so.0
No symbol table info available.
#16 0x00007f6f5be90851 in make_fcontext () from /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_master@4/install/bin/../prereq/dev/argobots/lib/libabt.so.0
No symbol table info available.
#17 0x0000000000000000 in ?? ()
No symbol table info available.





  • No labels