DAOS-6820 -1.2 Verbs Automated testing status

TestFolder Count
ResultDefectLogsNotes
object22object/obj_fetch_bad_param.py:                     PASS
/scratch/samirrav/verbs_Logs/object/job-2021-02-25T00.58-bd87a06-object-obj_fetch_bad_param
object/array_obj_test.py:                          PASS
/scratch/samirrav/verbs_Logs/object/job-2021-02-25T01.12-78980dd-object-array_obj_test
object/obj_open_bad_param.py:                      PASS
/scratch/samirrav/verbs_Logs/object/job-2021-02-25T01.08-5a3eb97-object-obj_open_bad_param
object/obj_update_bad_param.py:                    PASS
/scratch/samirrav/verbs_Logs/object/job-2021-02-25T01.04-86e5ac7-object-obj_update_bad_param
object/create_many_dkeys.py:                       PASS
/scratch/samirrav/verbs_Logs/object/job-2021-02-25T01.28-ed955fe-object-create_many_dkeysTest issue. SCM size needs to be increased to the avoided space issue. Increase the size and test passed
object/object_integrity.py:                        PASS
/scratch/samirrav/verbs_Logs/object/job-2021-02-25T00.39-fb0482e-object-object_integrity
object/same_key_different_value.py:                PASS
/scratch/samirrav/verbs_Logs/object/job-2021-02-25T01.52-06b6b04-object-same_key_different_value/Test issue. SCM size needs to be increased to the avoided space issue. Increase the size and test passed
object/punch_test.py:                              PASS
/scratch/samirrav/verbs_Logs/object/job-2021-02-25T01.00-338838a-object-punch_test
daos_racer1daos_racer/daos_racer.py:                          PASS
/scratch/samirrav/verbs_Logs/daos_racer/job-2021-03-13T03.34-58e3fd1-daos_racer-daos_racer/
NVMe18nvme/nvme_io_stats.py:                             PASS
/scratch/samirrav/verbs_Logs/nvme/job-2021-03-02T00.00-9c46097-nvme-nvme_io_stats
nvme/nvme_pool_exclude.py:                         Skipped


nvme/nvme_fragmentation.py:                        PASS
/scratch/samirrav/verbs_Logs/nvme/job-2021-03-02T00.03-436ba66-nvme-nvme_fragmentation
nvme/nvme_pool_extend.py:                          Skipped


nvme/nvme_object.py:                               FAIL
/scratch/samirrav/verbs_Logs/nvme/job-2021-03-04T23.57-6cdf0a0-nvme-nvme_object

Single pool works fine but multiple pools fail with NO SPACE issue. looks like test issue 

to container 42B8B01A-DBCF-4328-96B1-4F0275885CE3: Object update returned non-zero. RC: -1007

nvme/nvme_health.py:                               PASS

Used PR-4832 for fixing some regression locally
nvme/enospace.py:                                  FAIL

Test issue and skipped in the test for PR-4832 but nothing looks from verbs side
nvme/nvme_io.py:                                   FAILDAOS-7021
The issue with S1 object type so open new defect.
nvme/nvme_pool_capacity.py:                        PASS


nvme/nvme_fault.py:                                PASS
/scratch/samirrav/verbs_Logs/nvme/job-2021-03-16T20.30-5ccc5f3-nvme-nvme_fault/With PR fixes.
server6server/daos_server_restart.py:                     PASS
/scratch/samirrav/verbs_Logs/server/job-2021-03-04T22.55-63ef643-server-daos_server_restart
server/cpu_usage.py:                               PASSDAOS-6944

/scratch/samirrav/verbs_Logs/server/job-2021-03-11T19.11-718b938-server-cpu_usage

/scratch/samirrav/verbs_Logs/server/job-2021-03-04T21.07-c8285fd-server-cpu_usage/ 

Tried again multiple times and test passed

Looks like DAOS side issue and created new ticket

2021-03-04 21:09:29,185 cpu_usage        L0062 INFO | CPU Usage = 140.0
2021-03-04 21:09:31,187 stacktrace       L0039 ERROR|
2021-03-04 21:09:31,187 stacktrace       L0042 ERROR| Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
2021-03-04 21:09:31,212 stacktrace       L0045 ERROR| Traceback (most recent call last):
2021-03-04 21:09:31,212 stacktrace       L0045 ERROR|   File "/usr/lib/daos/TESTING/ftest/server/cpu_usage.py", line 70, in test_cpu_usage
2021-03-04 21:09:31,212 stacktrace       L0045 ERROR|     float(usage) < 100, "CPU usage is above 100%: {}%".format(usage))
2021-03-04 21:09:31,212 stacktrace       L0045 ERROR|   File "/usr/lib64/python2.7/unittest/case.py", line 462, in assertTrue
2021-03-04 21:09:31,212 stacktrace       L0045 ERROR|     raise self.failureException(msg)
2021-03-04 21:09:31,213 stacktrace       L0045 ERROR| AssertionError: CPU usage is above 100%: 140.0%
server/metadata.py:                                FAIL
/scratch/samirrav/verbs_Logs/server/job-2021-03-04T22.08-1c8c468-server-metadata

Looks like some test issue where IOR failed to write 

[stderr] [1,0]<stderr>:Failed to lookup parent dir
[stderr] [1,0]<stderr>:application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
[stderr] [1,0]<stderr>:[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=-1
[stderr] [1,0]<stderr>::
[stderr] [1,0]<stderr>:system msg for write_line failure : Bad file descriptor

server/dynamic_start_stop.py:                      FAIL
/scratch/samirrav/verbs_Logs/server/job-2021-03-04T22.45-95c1519-server-dynamic_start_stop

Looks like test issue 

Command '/usr/bin/dmg -o /etc/daos/daos_control.yml -j system query --verbose' finished with 0 after 0.219604969025s
system_query data: {}
<SERVER> Verifying server states: group=daos_server, hosts=wolf-[118-119]
  Unable to obtain current server state.  Undefined expected server states due to a failure starting the servers.
Writing yaml configuration file /var/tmp/daos_testing/test_daos_server_dmg.yaml
Copying /var/tmp/daos_testing/test_daos_server_dmg.yaml yaml configuration file to /etc/daos/daos_control.yml on ['wolf-79']
Command environment vars:
  None
Running '/usr/bin/dmg -o /etc/daos/daos_control.yml -j system query --verbose'
[stdout] {
[stderr] ERROR: dmg: unable to contact the DAOS Management Service
[stdout]   "response": {
[stdout]     "members": null
[stdout]   },
[stdout]   "error": "unable to contact the DAOS Management Service",
[stdout]   "status": -1025
[stdout] }
Command '/usr/bin/dmg -o /etc/daos/daos_control.yml -j system query --verbose' finished with 1 after 0.1741938591s
Error occurred running '/usr/bin/dmg -o /etc/daos/daos_control.yml -j system query --verbose': Command '/usr/bin/dmg -o /etc/daos/daos_control.yml -j system query --verbose' failed (rc=1)

server/daos_server_config.py:                      PASS
/scratch/samirrav/verbs_Logs/server/job-2021-03-04T21.36-a059137-server-daos_server_config/
datamover8datamover/dm_posix_subsets.py                    PASS
/scratch/samirrav/verbs_Logs/datamover/job-2021-02-26T21.58-79d47c3-datamover-dm_posix_subsets/
datamover/copy_procs.py:                           PASS
/scratch/samirrav/verbs_Logs/datamover/job-2021-02-26T22.01-2ec4685-datamover-copy_procs/
datamover/copy_negative.py:                        FAILDAOS-6871/home/samirrav/avocado/job-results/job-2021-02-26T22.07-ae336da-datamover-copy_negativeLooks like existing failure and Dalton is aware of this failure
datamover/dm_large_dir.py                       PASS
/scratch/samirrav/verbs_Logs/datamover/job-2021-02-26T22.28-c6c4896-datamover-dm_large_file/
datamover/dm_large_file.py                 PASS
/scratch/samirrav/verbs_Logs/datamover/job-2021-02-26T22.42-3c32f26-datamover-dm_large_dir
container24container/container_async.py:                      PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T22.24-d0197f8-container-container_async
container/simple_create_delete_test.py:            PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T22.27-dcc6dbe-container-simple_create_delete_test
container/container_check.py:                      PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T21.55-14a152d-container-container_check/




container/create.py:                               PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T22.34-46f77e9-container-create
container/basic_snapshot.py:        PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T21.58-28968fb-container-basic_snapshot/
container/basic_tx_test.py:                        PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T23.19-cc64c8c-container-basic_tx_test
container/open_close.py:                           PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T23.17-82728b7-container-open_close
container/delete.py:                               PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T22.37-3858acc-container-delete
container/open.py:                                 PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T22.48-8462ad4-container-open
container/global_handle.py:                        PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T22.33-779d30a-container-global_handle
container/full_pool_container_create.py:           Skip


container/root_container_test.py:                  PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T22.51-33ded5c-container-root_container_test
container/snapshot.py:                             FAIL

DAOS-6893

/scratch/samirrav/verbs_Logs/container/job-2021-03-11T19.27-add1508-container-snapshot/

Existing issue open

Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
Traceback (most recent call last):
  File "/usr/lib/daos/TESTING/ftest/container/snapshot.py", line 453, in test_snapshots
    "the original data written.", ss_number)
Exception: ('##(5.2)Snapshot #%s, test data Mis-matchesthe original data written.', 1)
container/multiple_container_delete.py:            FAIL
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T23.41-a1f4393-container-multiple_container_delete/

Test issue

Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/avocado/core/test.py", line 667, in _run_avocado
    raise test_exception

TypeError: create_cont() takes exactly 2 arguments (1 given)

container/list_containers.py:                      PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T23.45-16812a3-container-list_containers/
container/query_attribute.py:                      PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T23.43-5c50f5f-container-query_attribute/
container/attribute.py:                            PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T23.03-7e13913-container-attribute/
container/snapshot_aggregation.py:                 PASS
/scratch/samirrav/verbs_Logs/container/job-2021-03-01T23.31-ce3d95a-container-snapshot_aggregation
aggregation7aggregation/aggregation_io_small.py:               PASS
/scratch/samirrav/verbs_Logs/aggregation/job-2021-02-26T23.21-42236ee-aggregation-aggregation_io_small/
aggregation/aggregation_basic.py:                  PASS
/scratch/samirrav/verbs_Logs/aggregation/job-2021-02-26T23.45-fd92726-aggregation-aggregation_basic
aggregation/aggregation_punching.py:               PASS
/scratch/samirrav/verbs_Logs/aggregation/job-2021-02-26T23.27-21a327a-aggregation-aggregation_punching/
aggregation/aggregation_checksum.py:               PASS
/scratch/samirrav/verbs_Logs/aggregation/job-2021-02-26T23.30-c5770ba-aggregation-aggregation_checksum/
aggregation/dfuse_space_check.py:                  FAIL
/scratch/samirrav/verbs_Logs/aggregation/job-2021-02-26T23.30-c5770ba-aggregation-aggregation_checksum/

Test Issue 

TypeError: create_cont() takes exactly 2 arguments (1 given)

aggregation/aggregation_throttling.py:             FAILDAOS-6897/scratch/samirrav/Defect_logs/DAOS-6879/job-2021-02-27T00.00-aa0e5d7-aggregation-aggregation_throttling/

Looks like a test issue but could be daos issue too so open new defect

2021-02-26 23:44:36,524 aggregation_thro L0126 INFO | Max perf diff 52.0413757813 < 30.0
2021-02-26 23:44:36,524 stacktrace       L0039 ERROR|
2021-02-26 23:44:36,524 stacktrace       L0042 ERROR| Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
2021-02-26 23:44:36,545 stacktrace       L0045 ERROR| Traceback (most recent call last):
2021-02-26 23:44:36,545 stacktrace       L0045 ERROR|   File "/usr/lib/daos/TESTING/ftest/aggregation/aggregation_throttling.py", line 84, in test_aggregation_throttling
2021-02-26 23:44:36,546 stacktrace       L0045 ERROR|     expected_perf_diff)
2021-02-26 23:44:36,546 stacktrace       L0045 ERROR|   File "/usr/lib/daos/TESTING/ftest/aggregation/aggregation_throttling.py", line 127, in verify_performance
2021-02-26 23:44:36,546 stacktrace       L0045 ERROR|     self.assertTrue(max_perf_diff < expected_perf_diff)
2021-02-26 23:44:36,546 stacktrace       L0045 ERROR|   File "/usr/lib64/python2.7/unittest/case.py", line 462, in assertTrue
2021-02-26 23:44:36,546 stacktrace       L0045 ERROR|     raise self.failureException(msg)
2021-02-26 23:44:36,546 stacktrace       L0045 ERROR| AssertionError: False is not true
control22control/dmg_storage_query.py:                      PASS
/scratch/samirrav/verbs_Logs/control/job-2021-02-27T00.47-5d9010a-control-dmg_storage_query
control/dmg_system_reformat.py:                    Skipped


control/daos_agent_config.py:                      PASS
/scratch/samirrav/verbs_Logs/control/job-2021-02-27T00.36-c2e0b52-control-daos_agent_config
control/dmg_storage_scan_scm.py:                   PASS
/scratch/samirrav/verbs_Logs/control/job-2021-02-27T00.55-4a5c8ca-control-dmg_storage_scan_scm
control/super_block_versioning.py:                 PASS
/scratch/samirrav/verbs_Logs/control/job-2021-02-27T00.34-8c21687-control-super_block_versioning/
control/daos_control_config.py:                    PASS
/scratch/samirrav/verbs_Logs/control/job-2021-02-27T00.42-2ada17a-control-daos_control_config
control/dmg_network_scan.py:                       FAIL
/scratch/samirrav/verbs_Logs/control/job-2021-02-27T00.32-a4dbc32-control-dmg_network_scan

Test issue Server is trying to start on client

"Failed to start servers before format: Error occurred running 'sudo -n systemctl enable daos_server.service' on wolf-79",

control/daos_snapshot.py:                          Skip

TestSkipError: Skipping until DAOS-4691 is fixed

control/daos_scm_config.py:                        PASS
/scratch/samirrav/verbs_Logs/control/job-2021-02-27T01.01-36b3553-control-daos_scm_config
control/dmg_pool_evict.py:                         FAILDAOS-5104/scratch/samirrav/verbs_Logs/control/job-2021-02-27T01.02-9f87243-control-dmg_pool_evict/

Looks liketest needs to be updated

2021-02-27 01:04:09,010 stacktrace       L0039 ERROR|
2021-02-27 01:04:09,010 stacktrace       L0042 ERROR| Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
2021-02-27 01:04:09,011 stacktrace       L0045 ERROR| Traceback (most recent call last):
2021-02-27 01:04:09,011 stacktrace       L0045 ERROR|   File "/usr/lib/daos/TESTING/ftest/control/dmg_pool_evict.py", line 64, in test_dmg_pool_evict
2021-02-27 01:04:09,011 stacktrace       L0045 ERROR|     "evict!")
2021-02-27 01:04:09,011 stacktrace       L0045 ERROR|   File "/usr/lib/python2.7/site-packages/avocado/core/test.py", line 762, in fail
2021-02-27 01:04:09,011 stacktrace       L0045 ERROR|     raise exceptions.TestFail(message)
2021-02-27 01:04:09,011 stacktrace       L0045 ERROR| TestFail: daos pool list-cont with second pool succeeded after pool evict!
control/dmg_nvme_scan_test.py:                     PASS
/scratch/samirrav/verbs_Logs/control/job-2021-02-27T00.50-5c0342d-control-dmg_nvme_scan_test/dmg_storage_scan_scm
control/dmg_system_leader_query.py:                PASS
/scratch/samirrav/verbs_Logs/control/job-2021-02-27T00.45-fed9830-control-dmg_system_leader_query/
control/daos_admin_privileged.py:                  FAIL
/scratch/samirrav/verbs_Logs/control/job-2021-02-27T00.33-8cb3015-control-daos_admin_privileged/

Looks like a test issue,

2021-02-27 00:33:16,800 stacktrace       L0039 ERROR|
2021-02-27 00:33:16,800 stacktrace       L0042 ERROR| Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
2021-02-27 00:33:16,800 stacktrace       L0045 ERROR| Traceback (most recent call last):
2021-02-27 00:33:16,800 stacktrace       L0045 ERROR|   File "/usr/lib/daos/TESTING/ftest/control/daos_admin_privileged.py", line 38, in test_daos_admin_format
2021-02-27 00:33:16,801 stacktrace       L0045 ERROR|     file_stats = os.stat("/usr/bin/daos_admin")
2021-02-27 00:33:16,801 stacktrace       L0045 ERROR| OSError: [Errno 2] No such file or directory: '/usr/bin/daos_admin'
2021-02-27 00:33:16,801 stacktrace       L0046 ERROR|

control/ssd_socket.py:                             PASS
/scratch/samirrav/verbs_Logs/control/job-2021-02-27T00.56-878db9c-control-ssd_socket/
control/dmg_pool_query_test.py:                    PASS
/scratch/samirrav/verbs_Logs/control/job-2021-02-27T01.04-b55ddd1-control-dmg_pool_query_test/
control/daos_object_query.py:                      FAIL
/scratch/samirrav/verbs_Logs/control/job-2021-02-27T00.33-7812483-control-daos_object_query

Looks like a test issue

2021-02-27 00:34:36,062 test_utils_conta L0618 INFO | Writing 1 object(s), with 1 record(s) of 100 bytes(s) each, in container CA1EE7F4-E7D1-4971-9574-06EF16FD12AB with object class 221
2021-02-27 00:34:36,063 stacktrace       L0039 ERROR|
2021-02-27 00:34:36,063 stacktrace       L0042 ERROR| Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
2021-02-27 00:34:36,064 stacktrace       L0045 ERROR| Traceback (most recent call last):
2021-02-27 00:34:36,064 stacktrace       L0045 ERROR|   File "/usr/lib/daos/TESTING/ftest/control/daos_object_query.py", line 67, in test_object_query
2021-02-27 00:34:36,064 stacktrace       L0045 ERROR|     self.container.write_objects(obj_class=obj_class)
2021-02-27 00:34:36,064 stacktrace       L0045 ERROR|   File "/usr/lib/python2.7/site-packages/avocado/core/decorators.py", line 51, in wrap
2021-02-27 00:34:36,064 stacktrace       L0045 ERROR|     raise core_exceptions.TestFail(str(details))
2021-02-27 00:34:36,064 stacktrace       L0045 ERROR| TestFail: Error writing data (dkey=C99Y, akey=J3TN, data=N3QZOOYTIFRC8Q83KQG39Z2ZCSX8PJKPT8HJY5PLJ9K9C8ME7CNF6QC39SIZEBDF29JOKIJX2BZKIZTTKJRMEX02JX7LCLOQZBQL) to container CA1EE7F4-E7D1-4971-9574-06EF16FD12AB: Unknown DAOS object enumeration class for 221 (<type 'int'>)
pool48pool/bad_connect.py:                               PASS
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T00.44-e26eda2-pool-bad_connect/
pool/connect_test.py:                              FAIL
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T21.04-8707d15-pool-connect_test

Looks like some test reporting issue

ERROR Test reported status but did not finish -> TestAbortedError: 1-./pool/connect_test.py:ConnectTest.test_connect;hosts-pool-server_config-validsetname-cc29

pool/pool_svc.py:                                  FAIL
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T21.16-1a3a941-pool-pool_svc

Test timeout issue so tried increasing the timeout

Test got a pass but saw this error so looks like the test need to be fixed

Command '/usr/bin/dmg -o /etc/daos/daos_control.yml -j pool create --group=samirrav --nsvc=0 --scm-size=134217728 --sys=daos_server --user=samirrav' finished with 0 after 9.0510468483s
## TestFail exception is caught at pool create!
Test was expected to fail, but it passed.

Traceback (most recent call last):
  File "/usr/lib/daos/TESTING/ftest/pool/pool_svc.py", line 43, in test_poolsvc
    self.fail("Test was expected to fail, but it passed.\n")
  File "/usr/lib/python2.7/site-packages/avocado/core/test.py", line 762, in fail
    raise exceptions.TestFail(message)
TestFail: Test was expected to fail, but it passed.

pool/simple_create_delete_test.py:                 PASS
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T21.23-7248946-pool-simple_create_delete_test
pool/create_capacity_test.py:                      FAIL
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T21.57-4b3c362-pool-create_capacity_test

Test issue

Running '/usr/bin/dmg -o /etc/daos/daos_control.yml -j pool create --group=samirrav --nsvc=1 --nvme-size=4.0GiB --scm-size=0.0B --sys=daos_server --user=samirrav'
[stdout] {
[stderr] ERROR: dmg: failed to generate PoolCreate request: can't create pool with 0 SCM
[stdout]   "response": null,
[stdout]   "error": "failed to generate PoolCreate request: can't create pool with 0 SCM",
[stdout]   "status": -1025
[stdout] }

pool/create_test.py:                               FAIL
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T22.05-25f9c68-pool-create_test

Test issue

Running '/usr/bin/dmg -o /etc/daos/daos_control.yml -j pool create --group=samirrav --nsvc=1 --nvme-size=8.0GiB --ranks=0 --scm-size=0.0B --sys=daos_server --user=samirrav'
[stdout] {
[stderr] ERROR: dmg: failed to generate PoolCreate request: can't create pool with 0 SCM
[stdout]   "response": null,
[stdout]   "error": "failed to generate PoolCreate request: can't create pool with 0 SCM",
[stdout]   "status": -1025
[stdout] }

pool/global_handle.py:                             PASS
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T21.33-fec3127-pool-global_handle/
pool/rebuild_tests.py:                             Skipped


pool/multi_server_create_delete_test.py:           PASS
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T21.36-94a74cb-pool-multi_server_create_delete_test
pool/destroy_tests.py:                             FAIL
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T22.15-ccf86b4-pool-destroy_tests/

Testtimeout issue

2021-03-05 22:18:15,813 stacktrace       L0045 ERROR|   File "/usr/lib/python2.7/site-packages/avocado/core/runner.py", line 312, in sigterm_handler
2021-03-05 22:18:15,813 stacktrace       L0045 ERROR|     raise RuntimeError("Test interrupted by SIGTERM")
2021-03-05 22:18:15,813 stacktrace       L0045 ERROR| RuntimeError: Test interrupted by SIGTERM

pool/bad_create.py:                                PASS
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T22.35-ff97d36-pool-bad_create/
pool/dynamic_server_pool.py:                       FAIL
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T22.39-32dea13-pool-dynamic_server_pool

Looks like test issue

2021-03-05 22:41:57,606 process          L0389 INFO | Running '/usr/bin/dmg -o /etc/daos/daos_control.yml -j pool create --group=samirrav --ranks=2 --scm-size=1G --user=samirrav'
2021-03-05 22:41:57,832 process          L0479 DEBUG| [stdout] {
2021-03-05 22:41:57,832 process          L0479 DEBUG| [stderr] ERROR: dmg: pool create failed: server: code = 644 description = "pool request contains invalid ranks: 2"
2021-03-05 22:41:57,833 process          L0479 DEBUG| [stdout]   "response": null,
2021-03-05 22:41:57,833 process          L0479 DEBUG| [stdout]   "error": "pool create failed: server: code = 644 description = \"pool request contains invalid ranks: 2\"",
2021-03-05 22:41:57,833 process          L0479 DEBUG| [stdout]   "status": -1025
2021-03-05 22:41:57,833 process          L0479 DEBUG| [stderr] ERROR: dmg: server: code = 644 resolution = "retry the request with a valid set of ranks"
2021-03-05 22:41:57,834 process          L0479 DEBUG| [stdout] }
2021-03-05 22:41:57,836 process          L0499 INFO | Command '/usr/bin/dmg -o /etc/daos/daos_control.yml -j pool create --group=samirrav --ranks=2 --scm-size=1G --user=samirrav' finished with 1 after 0.226467132568s
2021-03-05 22:41:57,837 output           L0655 DEBUG| Error occurred running '/usr/bin/dmg -o /etc/daos/daos_control.yml -j pool create --group=samirrav --ranks=2 --scm-size=1G --user=samirrav': Command '/usr/bin/dmg -o /etc/daos/daos_control.yml -j pool create --group=samirrav --ranks=2 --scm-size=1G --user=samirrav' failed (rc=1)

pool/rebuild_no_cap.py:                            Skipped


pool/bad_evict.py:                                 FAIL
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T22.44-0b52041-pool-bad_evict/

Test issue

Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
Traceback (most recent call last):
  File "/usr/lib/daos/TESTING/ftest/pool/bad_evict.py", line 74, in test_evict
    pool.pool.evict()
AttributeError: 'DaosPool' object has no attribute 'evict'

pool/query_attribute.py:                           PASS
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T22.48-8ae95fe-pool-query_attribute
pool/bad_query.py:                                 PASS
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T22.50-24457b3-pool-bad_query/
pool/attribute.py:                                 PASS
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T22.53-4964b83-pool-attribute/
pool/destroy_rebuild.py:                           Skipped


pool/list_pools.py:                                PASS
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T23.10-f7ed181-pool-list_pools/
pool/storage_ratio.py:                             FAIL
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T23.28-272d080-pool-storage_ratio/

Looks like change in behavior so test code needs to be updated

2021-03-05 23:31:02,976 stacktrace       L0039 ERROR|
2021-03-05 23:31:02,977 stacktrace       L0042 ERROR| Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
2021-03-05 23:31:02,977 stacktrace       L0045 ERROR| Traceback (most recent call last):
2021-03-05 23:31:02,977 stacktrace       L0045 ERROR|   File "/usr/lib/daos/TESTING/ftest/pool/storage_ratio.py", line 68, in test_storage_ratio
2021-03-05 23:31:02,977 stacktrace       L0045 ERROR|     .format(tests[key], tests[key][2]))
2021-03-05 23:31:02,977 stacktrace       L0045 ERROR|   File "/usr/lib/python2.7/site-packages/avocado/core/test.py", line 762, in fail
2021-03-05 23:31:02,977 stacktrace       L0045 ERROR|     raise exceptions.TestFail(message)
2021-03-05 23:31:02,977 stacktrace       L0045 ERROR| TestFail: Pool Creation ['1G', '200G', 'WARNING'] Suppose to WARNING

pool/rebuild_with_ior.py:                          Skipped


pool/evict_test.py:                                PASS
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T23.37-1375e2f-pool-evict_test/
pool/rebuild_with_io.py:                           Skipped


pool/multiple_creates_test.py:                     PASS
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T23.57-0a1b4df-pool-multiple_creates_test
pool/info_tests.py:                                PASS
/scratch/samirrav/verbs_Logs/pool/job-2021-03-05T23.49-abf9ec6-pool-info_tests/
checksum2checksum/basic_checksum.py:                        PASS
/scratch/samirrav/verbs_Logs/checksum/basic_checksum
checksum/csum_error_logging.py:                    PASS
/scratch/samirrav/verbs_Logs/checksum/csum_error_logging
dbench1dbench/dbench.py:                                  PASS
/scratch/samirrav/verbs_Logs/dbench/dbench
ior10ior/ior_small.py:                     PASS
/scratch/samirrav/verbs_Logs/ior/ior_smallTest timeout issue I had to increase to 300 
ior/ior_large.py: FAIL
/scratch/samirrav/verbs_Logs/ior/job-2021-03-13T18.24-5686994-ior-large

There looks like multiple test issues:

  1.    servers: !mux is not going to work now
        1_server:
          test_servers:
            - server-A
        4_servers:
          test_servers:
            - server-A
            - server-B
            - server-C
            - server-D
  2. "RP_2GX" will not be able to run with a single server
ior/ior_intercept_basic.py:                        Skipped


ior/ior_intercept_dfuse_mix.py:                    FAIL
/scratch/samirrav/verbs_Logs/ior/iorinterceptmix/IOR failed during read [stdout] ERROR: read(24, 0x7fbe4bf07000, 1048576) returned EOF prematurely, (aiori-POSIX.c:577)
ior/ior_intercept_verify_data_integrity.py:        FAIL
/scratch/samirrav/verbs_Logs/ior/iorinterceptverifydataTestFail: Error: control method API not supported for create()
ior/ior_hdf5.py:    test_ior_hdf5                       PASS
/scratch/samirrav/verbs_Logs/ior/ior_hdf5
ior/ior_hdf5.py:    test_ior_hdf5_vol           PASS
/scratch/samirrav/verbs_Logs/ior/ior_hdf5_vol
ior/crash_ior.py:                                  Skipped


ior/ior_intercept_multi_client.py:                 FAIL
/scratch/samirrav/verbs_Logs/ior/iorinterceptmulticlient

 Looks like the Test issue

-> mean_mib <type 'int'>: 3
 -> write_x <type 'int'>: 3
*** TEARDOWN called after test completion: elapsed time: 340.311693907 seconds ***

erasurecode5erasurecode/ec_rebuild_disabled.py:                Skip


erasurecode/ec_offline_rebuild.py:                 Skip


erasurecode/ec_mdtest_smoke.py:                    PASS
/scratch/samirrav/verbs_Logs/erasurecode/ec_mdtest
erasurecode/ec_online_rebuild.py:                  Skip


erasurecode/ec_ior_smoke.py:                       PASS
/scratch/samirrav/verbs_Logs/erasurecode/ec_iorTest timeout issue I had to increase to 400
mdtest1mdtest/mdtest_large.py:                            FAIL
/scratch/samirrav/verbs_Logs/mdtest/job-2021-03-13T05.45-1dc3f70-mdtest-large/

Test timeout issue

2021-03-13 08:25:22,805 stacktrace       L0045 ERROR|   File "/usr/lib/python2.7/site-packages/avocado/core/runner.py", line 312, in sigterm_handler
2021-03-13 08:25:22,805 stacktrace       L0045 ERROR|     raise RuntimeError("Test interrupted by SIGTERM")
2021-03-13 08:25:22,805 stacktrace       L0045 ERROR| RuntimeError: Test interrupted by SIGTERM
2021-03-13 08:25:22,805 stacktrace       L0046 ERROR|
2021-03-13 08:25:22,805 test             L0601 DEBUG| Local variables:

1mdtest/mdtest_small.pyPASS

Added timeout between pool query. For logs, check DAOS-6580.

try:

if display_space:
time.sleep(5)
pool.display_pool_daos_space()

daos_vol4daos_vol/daos_vol.py:                              PASS
/scratch/samirrav/verbs_Logs/daos_vol/job-2021-02-25T00.12-aed73fe-daos_vol-daos_vol/Ran with mpich only, openmpi there was an issue with Avocado and env. but was able to run openmpi manually (single test) and that worked
io22io/fio_small.py:                                   PASS
/scratch/samirrav/verbs_Logs/io/job-2021-03-06T00.47-c8231aa-io-fio_small/
io/unaligned_io.py:                                PASS
/scratch/samirrav/verbs_Logs/io/job-2021-03-06T00.52-935eb5f-io-unaligned_io/
io/daos_vol_bigio.py:                              Skipped


io/macsio_test.py:                                 Some Skipped and Fail
/scratch/samirrav/verbs_Logs/io/job-2021-03-08T19.40-e414801-io-macsio_test

Local env issue

[stderr] [wolf-81.wolf.hpdd.intel.com:38374] PMIX ERROR: BAD-PARAM in file event/pmix_event_notification.c at line 848

io/parallel_io.py:                                 PASS
/scratch/samirrav/verbs_Logs/io/job-2021-03-08T19.45-5e97c9e-io-parallel_io/

For first test fio command failed but test is passing 


Command:
  fio --name=global --bs=1M --direct=1 --directory=/tmp/daos_dfuse//907E96B2-18B8-47F2-9D66-5519B049CA04 --group_reporting=1 --iodepth=16 --ioengine=libaio --rw=rw --size=10M --thread=1 --verify=crc64 --name=test --numjobs=1
Command return codes:
  wolf-79: rc=1
Command output:
  wolf-79:
    test: (g=0): rw=rw, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=16
    fio-3.7
    Starting 1 thread
    test: Laying out IO file (1 file / 10MiB)
    fio: pid=19622, err=22/file:ioengines.c:422, func=io commit, error=Invalid argument

    test: (groupid=0, jobs=1): err=22 (file:ioengines.c:422, func=io commit, error=Invalid argument): pid=19622: Mon Mar  8 19:47:58 2021
      cpu          : usr=0.00%, sys=0.00%, ctx=7, majf=0, minf=4
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=1,0,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=16

    Run status group 0 (all jobs):
Exception in thread Thread-66:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 765, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/daos/TESTING/ftest/util/fio_test_base.py", line 68, in execute_fio
    self.fio_cmd.run()
  File "/usr/lib/daos/TESTING/ftest/util/command_utils.py", line 106, in run
    return self._run_process()
  File "/usr/lib/daos/TESTING/ftest/util/fio_utils.py", line 179, in _run_process
    ", ".join(failed)))
CommandFailure: Error running fio on the following hosts: wolf-79: rc=1
io/io_aggregation.py:                              Fail
/scratch/samirrav/verbs_Logs/io/job-2021-03-08T19.21-42c55de-io-io_aggregation

Test Issue

Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
Traceback (most recent call last):
  File "/usr/lib/daos/TESTING/ftest/io/io_aggregation.py", line 82, in test_ioaggregation
    self.run_ior_with_pool()
  File "/usr/lib/daos/TESTING/ftest/util/ior_test_base.py", line 139, in run_ior_with_pool
    self.update_ior_cmd_with_pool(create_cont)
  File "/usr/lib/daos/TESTING/ftest/util/ior_test_base.py", line 176, in update_ior_cmd_with_pool
    self.create_pool()
  File "/usr/lib/daos/TESTING/ftest/util/ior_test_base.py", line 66, in create_pool
    self.pool.create()
  File "/usr/lib/python2.7/site-packages/avocado/core/decorators.py", line 51, in wrap
    raise core_exceptions.TestFail(str(details))
TestFail: Error: control method API not supported for create()

io/dfuse_sparse_file.py:                           Fail
/scratch/samirrav/verbs_Logs/io/job-2021-03-08T19.56-f48e925-io-dfuse_sparse_file/

Test issue

2021-03-08 19:58:14,242 stacktrace       L0039 ERROR|
2021-03-08 19:58:14,242 stacktrace       L0042 ERROR| Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
2021-03-08 19:58:14,243 stacktrace       L0045 ERROR| Traceback (most recent call last):
2021-03-08 19:58:14,243 stacktrace       L0045 ERROR|   File "/usr/lib/daos/TESTING/ftest/io/dfuse_sparse_file.py", line 49, in test_dfusesparsefile
2021-03-08 19:58:14,243 stacktrace       L0045 ERROR|     self.create_cont()
2021-03-08 19:58:14,243 stacktrace       L0045 ERROR| TypeError: create_cont() takes exactly 2 arguments (1 given)
2021-03-08 19:58:14,243 stacktrace       L0046 ERROR|
2021-03-08 19:58:14,243 test             L0601 DEBUG| Local variables:

io/seg_count.py:                                   Fail
/scratch/samirrav/verbs_Logs/io/job-2021-03-08T20.07-8bfeaeb/

Test issue 

[stdout] {
[stderr] ERROR: dmg: pool create failed: DAOS error (-1007): DER_NOSPACE
[stdout]   "response": null,
[stdout]   "error": "pool create failed: DAOS error (-1007): DER_NOSPACE",
[stdout]   "status": -1007
[stdout] }

io/dfuse_bash_cmd.py:                              PASS
/scratch/samirrav/verbs_Logs/io/job-2021-03-08T20.18-0de14f6-io-dfuse_bash_cmd
io/llnl_mpi4py.py:                                 PASS
/scratch/samirrav/verbs_Logs/io/job-2021-03-08T22.57-b5ce528-io-llnl_mpi4py
io/io_consistency.py:                              PASS
/scratch/samirrav/verbs_Logs/io/job-2021-03-08T20.40-d9fdc81-io-io_consistencyPassed after increasing the timeout in yaml file
io/romio.py:                                       PASS
/scratch/samirrav/verbs_Logs/io/job-2021-03-08T22.25-cdff4e8-io-romio
io/large_file_count.py:                            FAIL
/scratch/samirrav/verbs_Logs/io/job-2021-03-08T20.51-5aa2d20-io-large_file_count

Test issue

Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
Traceback (most recent call last):
  File "/usr/lib/daos/TESTING/ftest/io/large_file_count.py", line 64, in test_largefilecount
    self.container.destroy()
AttributeError: 'NoneType' object has no attribute 'destroy'

io/hdf5.py:                                        PASS
/scratch/samirrav/verbs_Logs/io/job-2021-03-08T20.47-b279665-io-hdf5/
io/dfuse_find_cmd.py:                              PASS
/scratch/samirrav/verbs_Logs/io/job-2021-03-08T21.02-fa56b6e-io-dfuse_find_cmd
io/ior_small.pyPASS

Increased the ior_timeout: 60 to make the test pass. Refer DAOS-6579 for workaround.
rebuild10rebuild/rebuild_widely_striped.py:                 FAIL
/scratch/samirrav/verbs_Logs/pool/job-2021-03-06T00.07-ab8df7e-rebuild-rebuild_widely_striped/

Test time issue

Command '/usr/bin/dmg -o /etc/daos/daos_control.yml pool query --pool=B33BF608-6A5B-4283-A9D9-E4B3898107C9' finished with 0 after 32.22884202s
Pool query is responsive

Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
Traceback (most recent call last):
  File "/usr/lib/daos/TESTING/ftest/rebuild/rebuild_widely_striped.py", line 82, in test_rebuild_widely_striped
    self.pool.wait_for_rebuild(False, interval=1)
  File "/usr/lib/daos/TESTING/ftest/util/test_utils_pool.py", line 504, in wait_for_rebuild
    format(self.pool_query_timeout.value))
DaosTestError: TIMEOUT detected after 30 seconds of pool query. This timeout can be adjusted via the 'pool/pool_query_timeout' test yaml parameter.

rebuild/io_conf_run.py:                            Skipped


rebuild/container_create.py:                       Skipped


rebuild/cascading_failures.py:                     Skipped


rebuild/pool_destroy_rebuild.py:                   FAIL
/scratch/samirrav/verbs_Logs/pool/job-2021-03-06T00.24-54b5d73-rebuild-pool_destroy_rebuild/

Test time issue tried after increasing got timeout for pool query

Command '/usr/bin/dmg -o /etc/daos/daos_control.yml pool query --pool=75AC2E47-139C-437B-B600-FCF1471122BC' finished with 0 after 32.2416799068s
Pool query is responsive

Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
Traceback (most recent call last):
  File "/usr/lib/daos/TESTING/ftest/rebuild/pool_destroy_rebuild.py", line 76, in test_pool_destroy_with_io
    self.pool.wait_for_rebuild(True, interval=1)
  File "/usr/lib/daos/TESTING/ftest/util/test_utils_pool.py", line 504, in wait_for_rebuild
    format(self.pool_query_timeout.value))
DaosTestError: TIMEOUT detected after 15 seconds of pool query. This timeout can be adjusted via the 'pool/pool_query_timeout' test yaml parameter.

rebuild/read_array.py:                             Skipped


rebuild/delete_objects.py:                         Failed

Existing failure DAOS-6547
security18security/cont_create_acl.py:                       PASS
/scratch/samirrav/verbs_Logs/security/cont_create_acl
security/pool_security_groups.py:                  PASS
/scratch/samirrav/verbs_Logs/security/pool_security_groups
security/cont_delete_acl.py:                       PASS
/scratch/samirrav/verbs_Logs/security/job-2021-02-23T02.47-66498bd-security-cont_delete_acl/
security/container_security_acl.py:                PASS
/scratch/samirrav/verbs_Logs/security/job-2021-02-23T02.34-7d33c08-security-container_security_acl/
security/cont_overwrite_acl.py:                    PASS
/scratch/samirrav/verbs_Logs/security/job-2021-02-23T02.49-3b8c80a-security-cont_overwrite_acl
security/cont_get_acl.py:                          PASS
/scratch/samirrav/verbs_Logs/security/cont_get_acl.py
security/pool_connect_init.py:                     FAIL
/scratch/samirrav/verbs_Logs/security/pool_connect_initTest issue [stderr] ERROR: dmg: pool create failed: server: code = 642 description = "requested SCM capacity (17 MB / 8) is too small (min 16 MiB per target)"
security/pool_security_acl.py:                     FAILDAOS-6917/scratch/samirrav/verbs_Logs/security/pool_security_acl

Test Failure looks like the existing test defect 


security/cont_update_acl.pyPASS
/scratch/samirrav/verbs_Logs/security/cont_update_acl.py
daos_test7daos_test/daos_core_test_dfs.py:                   Manually PASS
/scratch/samirrav/verbs_Logs/daos_test/daos_test_dfsTried with  test_daos_dfs_parallel: 1 and that works. with sockets, it works. There could be some test issue with Avocado when we ran with OFI. but it works manually fine.
daos_test/daos_core_test-nvme_recovery.py:         PASS
/scratch/samirrav/verbs_Logs/daos_test/daos_core_nvme_test
daos_test/daos_core_test.pyFAIL
/scratch/samirrav/verbs_Logs/daos_test/job-2021-03-11T19.33-244c10b-daos_test-daos_core_test

Observed the same failure as Ravi mention and looks like most defects are open and issue caused because of DAOS-5945

Ravi's input: When I ran all the tests, it was taking long time with various tests failing with timeout. So ran a single pool test and created DAOS-6549. DAOS-6549 is resolved stating DAOS-5945 is the root-cause for the problem.

daos_test/daos_core_test-rebuild.pyFAILDAOS-6840

/scratch/samirrav/verbs_Logs/daos_test/job-2021-03-11T20.30-cd51ab4-daos_test-daos_core_test-rebuild/

On master -/scratch/samirrav/verbs_Logs/daos_test/job-2021-03-17T23.41-3423e9c-daos_test-daos_core_test-rebuild

Only 23-24-25 - /scratch/samirrav/verbs_Logs/daos_test/job-2021-03-18T00.27-5f8e7df-daos_test-daos_core_test-rebuild

Tried running it again on master and now rebuild test 23/24/25 failed. Looks like test 23 failed but 24/25 did not run because of the previous fail so seems test issue? Tried running it 23,24,25 again and all Three got passed.

Observed timeout for test_rebuild_27 to be debug

Ravi's input: When I ran all the tests, it was taking a long time with various tests failing with timeout. So ran rebuild tests (12-15) test and created DAOS-6610. DAOS-6610 is resolved stating DAOS-5945 is the root-cause for the problem.

osa9osa/offline_reintegration.pyPASS


osa/offline_drain.pySkipped


osa/offline_extend.pySkipped


osa/online_reintegtation.pySkipped


osa/online_drain.pySkipped


osa/online_extend.pySkipped


osa/dmg_negative_test.pyPASS


osa/offline_parallel_test.pySkipped


osa/online_parallel_test.pySkipped



daos_test result:

test_daos_degraded_mode: d          PASS
test_daos_management: m             PASS
test_daos_pool: p                    Two test failed
test_daos_container: c                Three test failed
test_daos_epoch: e                    PASS
test_daos_single_rdg_tx: t          PASS
test_daos_distributed_tx: T            PASS
test_daos_verify_consistency: V        PASS
test_daos_io: i                        Two test failed
test_daos_object_array: A            PASS
test_daos_array: D                    PASS
test_daos_kv: K                        PASS
test_daos_capability: C                PASS
test_daos_epoch_recovery: o            PASS
test_daos_md_replication: R            PASS
test_daos_rebuild_simple: v            PASS
test_daos_drain_simple: b            PASS
test_daos_extend_simple: B            Missing (This was new test added recently but not added function in .py file so informed developer to add test function)
test_daos_oid_allocator: O            PASS
test_daos_checksum: z                PASS
test_daos_rebuild_ec: S                PASS
test_daos_aggregate_ec: Z            PASS
test_daos_degraded_ec_0to6: X        PASS but it's failing during pool destroy daos_fini() failed with -1001 (DAOS-6840)
test_daos_degraded_ec_8to22: X        PASS but it's failing during pool destroy daos_fini() failed with -1001 (DAOS-6840)
test_daos_dedup: U                    PASS