Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Proposal for DAOS tools consolidation.

...

Proposal: Commands, Resources, Operations and Arguments


Tool

ComponentComponent ArgsOperation

Operation Args

Description, Notes / IssuesAPIImplemented?

 dmg

 storage
 scan
discover all storage available on the nodes applying filters from yaml file
Y



 query
report status & stats about storage




query smd

--devices

--pools

query SMD device table

query SMD pool table.


Y



query nvme-health--hostlist="HOST:PORT"query raw SPDK NVMe device health stats. Returns all stats for all NVMe SSDs on all hosts in hostlist.
Y



query blobstore-health

--devuuid="DEVICE_UUID"

--tgtid="VOS_TGT_ID"

query BIO in-memory health data. Returns all BIO device health data and I/O errors & checksum error stats for given device UUID or VOS target ID.
Y



query device-state--devuuid="DEVICE_UUID"query the current device state of the given device UUID stored in SMD (ie NORMAL or FAULTY).
Y



set-faulty--devuuid="DEVICE_UUID"allow admin/user to manually set the device state of a given device to FAULTY (will trigger faulty device reaction callbacks).
Y



 prep


device-specific configuration that may require a reboot. E.g. setting up AEP DIMMs in interleaved mode


Y



 burnin

running fio against storage devices to verify it operates well and validate the performance.






 format


reset content of NVMe SSDs, format SCM with ext4, mount SCM and start the DAOS service (io_server)
Y



 fwupdate

firmware update




 network


 scan *


list discovered network interfaces


 * suggestion: report which interfaces and OFI providers would be used with the discovered interfaces


Y



 query *
 report status & stats about network interfaces

 * suggestion: perform a local test to indicate in advance if an OFI runtime error is going to occur with the interface, for example as seen with daos_server: "na_ofi_getinfo(): fi_getinfo failed, rc=-61(No data available)". Here, was from a VM build that didn't have PSM2 devel.




 system

--sys=SYSNAME

 query


report service status on all or a subset of the servers
Y



list-pools
list all pools created (do we want an alias for this as "pool list"?)
Y


Same as above

 list-ranks


 list all DAOS system server ranks in the specified system ("query" currently lists all system ranks, do we need list-ranks?)



Same as above stat *

report various stats about the service

 * alternative command name: get-statistics





Same as above log *
report service logs


 * alternative: get-log





Same as above

 debug *
 change debug mask


* alternative: set-debug





Same as above

 drain

 --fdomains=FDRANKLIST

--fd=FDRANKLIST *

 --ranks=SRVRANKLIST

 --targets=SRVRANK:TGTRANK LIST

 --tgt=SRVRANK:TGTRANK LIST

drain a list of racks, list of servers, or list of targets in preparation for maintenance

 * use --fd= as a convenience (shorter than --fdomains)

This one really does require ability to specify at target or SSD level. Use case: one of the SSDs in a server is about to fail, hot swap it after a drain and before a reintegrate.





Same as above

 reintegrate

 --fdomains=FDRANKLIST

--fd=FDRANKLIST

 --ranks=SRVRANKLIST

 --targets=SRVRANK:TGTRANK LIST

 --tgt=SRVRANK:TGTRANK LIST

reintegrate a drained component






Same as above stop
full shutdown of

the DAOS service


Y


Same as above start
restart service after full shutdown
Y


Same as above kill

 --fdomains=FDRANKLIST

--fd=FDRANKLIST

 --ranks=SRVRANKLIST

abrupt shutdown of a particular server (really: set of servers, or whole fault domains/racks of servers)



Same as above exclude *

 --fdomains=FDRANKLIST

--fd=FDRANKLIST

 --ranks=SRVRANKLIST

Remove node from DAOS system (really: set of servers, or whole fault domains/racks of servers)


 * alternative command names: del-nodes, del-servers?




 scrub
 start
Start background checksum scrubbing process (or resume after prior stop)




 stop
Stop scrubbing process




 query
Report status of background checksum scrubbing process (e.g., number of corruptions found, percentage of storage scanned so far)


 pool --pool=UUID

 --sys=SYSNAME

 --svc=SRVRANKLIST *

query
Report pool status


 * (applies to all pool commands) given a pool UUID and DAOS system name, eventually is expected the implementation will look up the existing pool service replica SRVRANKLIST (i.e., get rid of need for svc=)

 daos_pool_query()Y


Same as above stat *
Get pool statistics

 * alternative command name: get-statistics

?


Same as above get-prop *
Get pool properties


 * alternative: prop (but I like having the commands be "verbs")

?



set-propTBD

Y


Same as aboveget-acl *

overwrite-acl

update-acl

delete-acl


Get/set/delete pool access control?Y


Same as above

 get-attr

 set-attr

 del-attr

 list-attrs

 --attr=ATTRNAME (get,del)

 --value=VALUESTR (set)

no arguments for list-attrs

 Get / set user attributes

 daos_pool_attr_get()
 daos_pool_attr_set()
 daos_pool_attr_list()



Same as above

 list-containers

 list-cont


List all containers in the pool



N/A create--user=USERNAME@,

--group=GROUPNAME@,

--mode=MODE

--nsvc=NREP

--sys=SYSNAME

 --ranks=SRVRANKLIST *

 --scm-size=SIZE

 --nvme-size=SIZE

 --fdomains=FDRANKLIST

--fd=FDRANKLIST

--acl-file=FILE

 * change existing dmg --target= to --ranks=


 daos_pool_create()Y


Same as above destroy --force
 daos_pool_destroy()Y


Same as aboveadd-storage *

 --fdomains=FDRANKLIST

--fd=FDRANKLIST

 --ranks=SRVRANKLIST


Add a storage fault domain (rack) or list of servers to an existing pool

 * formerly named "extend"

 daos_pool_exclude()


Same as abovedel-storage *

 --fdomains=FDRANKLIST

--fd=FDRANKLIST

 --ranks=SRVRANKLIST

Remove a fault domain (rack) or list of servers from a pool.

 * formerly named "exclude"

 daos_pool_extend()



Same as above

 add-svc --ranks=MORESRVRANKLIST

Add a pool service replicate

--svc= to specify current list of metadata service server ranks; --ranks= to specify new ranks to add to the set.

?


Same as above del-svc --ranks=OLDSRVRANKLISTRemove a pool service replicate?


Same as above rebuild
Manage rebuild for a pool?


Same as above rebalance
Trigger rebalance after add-storage(extend) by racks/servers?


Same as above resize

 --scm-size=SIZE

 --nvme-size=SIZE

Extend the size of a pool's existing targets



Same as above evict
Evict all active pool connections daos_pool_evict()


Same as above lurk *
Dump activity on the pool


 * alternative: get-log

?
ToolComponent
OperationArgumentsDescription and Notes / IssuesAPI

 daos

 pool * --pool=UUID

 --sys=SYSNAME

 --svc=SRVRANKLIST

 query,

 stat,

 get-prop


report pool status (rebuild/rebalancing status, ...)

report various stats about the pool (size, usage, number of containers, ...), same as dmg pool stats

show pool properties

Note: daos pool is mostly "read-only" versus "dmg pool" used by the administrator. So the set-prop command is not available here.


Y (query, get-prop). Missing statistics support for "stat"

.

, but it may stay an admin/dmg thing ?




Same as above get-attr

 set-attr

 del-attr

 list-attrs

 --attr=ATTRNAME (get,del)

 --value=VALUESTR (set)

no arguments for list-attrs



Y



Same as above

 list-containers

 list-cont


List all containers in the specified pool
Y

container

cont

Pool related

(same as daos pool):

--pool=UUID

 --sys=SYSNAME

 --svc=SRVRANKLIST


Container (choose 1):

--cont=UUID

OR

--path=FILESYSDIR

 query *
show container status


query by container UUID with --cont 

or

query by unified namespace (directory or file) --path=FSENTITY (like current duns resolve_path). Note: do not specify --pool= when querying by path.


 * alternative: get-status

 daos_cont_query()Y


Same as above stat
show various container statistics

 * alternative command name: get-statistics

?Missing statistics/metrics support for "stat".


Same as above get-attr

 set-attr

 del-attr

 list-attrs

--attr=ATTRNAME (get,del)

 --value=VALUESTR (set)

no arguments for list-attrs

 set/retrieve user attributes


Same as aboveget-prop

 * is there such a thing as getting container properties (like pool properties)?

?Y



set-propTBD

Y (currently, only "label"property is supported.


Same as above

 list-objects

list-obj


Enumerate all objects in the specified container
Y


Pool related:

same as above

Container related:

(--cont=UUID)

OR

--path=FSENTITY

--type=CONTTYPE

--oclass=OBJCLASS

--chunk_size=NBYTES

 create

 (implementation generates CUUID if not specified)


Also optional are --path/type/oclass/chunk_size

create a container with specific properties (including type, object class, and chunk_size if provided) and link it with the path (if provided - similar to duns link_path, create a POSIX container with DFS-specific parameters). 

CONTTYPE: posix, hdf5


OBJCLASS: tiny, small, large, R2S, R2, repl_max

 daos_cont_create()Y


Same as query above destroy --force * destroy a container based on UUID or path (unlink path as well if provided)

* current dcont destroy does not have the --force option

 daos_cont_destroy()Y


Same as query above

create-snap,

list-snaps,

destroy-snap

--snap=NAME (create)

--epc=NUM (destroy)

--epcrange=RANGE (destroy)



Take, list, destroy container snapshots
Optionally name snapshots on creation. Snapshot created based on the most recent committed epoch.

List all snapshots in the container

Destroy a single snapshot by epoch number, or all that snapshots between two epoch numbers (inclusive of the begin/end epoch numbers?).


Y


Same as query above rollback

--snap=NAME

--epc=NUM

Revert a container back to a previous snapshot specified by name or epoch number.




 verify *

Validate content of a POSIX container

 * TBD - this was in a separate "daos fs" command section that has been merged into "daos cont"




 object

obj

Pool, cont related
(same as daos pool and daos cont):

--pool=UUID

 --sys=SYSNAME

 --svc=SRVRANKLIST

 --cont=UUID

Object:

 --oid=OID

 query *

 get-layout

TBD: Epoch?Show the layout of a particular DAOS object including all the targets where it is distributed

 * get-layout (or get layout) instead of query

 daos_obj_open()?
 daos_obj_fetch()?
Y


Same as above list-keys


 daos_obj_list_dkey()?
 daos_obj_list_akey()



Same as above dump
Dump content of an object


Proposal: TODO

  1. Container create use cases, unified namespace related additions
    1. container create by pool UUID + path (container UUID generated)
    2. query by path only
    3. Determine if create is to be done in "daos cont create" with more options, or have a uns or fs resource (e.g., daos uns).
  2. Container type specification (optional: unknown if not specified)
    1. --type=fs (or --type=posix)
    2. --type=block (future: e.g., spdk, virtio for cloud use cases)
    3. --type=hdf5
  3. Object class and chunk sizes.

...