Proposal for DAOS tools consolidation.
...
Proposal: Commands, Resources, Operations and Arguments
Tool | Component | Component Args | Operation | Operation Args | Description, Notes / Issues | API | Implemented? |
dmg | storage | scan | discover all storage available on the nodes applying filters from yaml file | Y | |||
query | report status & stats about storage | ||||||
query smd | --devices --pools | query SMD device table query SMD pool table. | Y | ||||
query nvme-health | --hostlist="HOST:PORT" | query raw SPDK NVMe device health stats. Returns all stats for all NVMe SSDs on all hosts in hostlist. | Y | ||||
query blobstore-health | --devuuid="DEVICE_UUID" --tgtid="VOS_TGT_ID" | query BIO in-memory health data. Returns all BIO device health data and I/O errors & checksum error stats for given device UUID or VOS target ID. | Y | ||||
query device-state | --devuuid="DEVICE_UUID" | query the current device state of the given device UUID stored in SMD (ie NORMAL or FAULTY). | Y | ||||
set-faulty | --devuuid="DEVICE_UUID" | allow admin/user to manually set the device state of a given device to FAULTY (will trigger faulty device reaction callbacks). | Y | ||||
prep | device-specific configuration that may require a reboot. E.g. setting up AEP DIMMs in interleaved mode | Y | |||||
burnin | running fio against storage devices to verify it operates well and validate the performance. | ||||||
format | reset content of NVMe SSDs, format SCM with ext4, mount SCM and start the DAOS service (io_server) | Y | |||||
fwupdate | firmware update | ||||||
network | scan * | list discovered network interfaces * suggestion: report which interfaces and OFI providers would be used with the discovered interfaces | Y | ||||
query * | report status & stats about network interfaces * suggestion: perform a local test to indicate in advance if an OFI runtime error is going to occur with the interface, for example as seen with daos_server: "na_ofi_getinfo(): fi_getinfo failed, rc=-61(No data available)". Here, was from a VM build that didn't have PSM2 devel. | ||||||
system | --sys=SYSNAME | query | report service status on all or a subset of the servers | Y | |||
list-pools | list all pools created (do we want an alias for this as "pool list"?) | Y | |||||
Same as above | list-ranks | list all DAOS system server ranks in the specified system ("query" currently lists all system ranks, do we need list-ranks?) | |||||
Same as above | stat * | report various stats about the service * alternative command name: get-statistics | |||||
Same as above | log * | report service logs * alternative: get-log | |||||
Same as above | debug * | change debug mask * alternative: set-debug | |||||
Same as above | drain | --fdomains=FDRANKLIST --fd=FDRANKLIST * --ranks=SRVRANKLIST --targets=SRVRANK:TGTRANK LIST --tgt=SRVRANK:TGTRANK LIST | drain a list of racks, list of servers, or list of targets in preparation for maintenance * use --fd= as a convenience (shorter than --fdomains) This one really does require ability to specify at target or SSD level. Use case: one of the SSDs in a server is about to fail, hot swap it after a drain and before a reintegrate. | ||||
Same as above | reintegrate | --fdomains=FDRANKLIST --fd=FDRANKLIST --ranks=SRVRANKLIST --targets=SRVRANK:TGTRANK LIST --tgt=SRVRANK:TGTRANK LIST | reintegrate a drained component | ||||
Same as above | stop | full shutdown of the DAOS service | Y | ||||
Same as above | start | restart service after full shutdown | Y | ||||
Same as above | kill | --fdomains=FDRANKLIST --fd=FDRANKLIST --ranks=SRVRANKLIST | abrupt shutdown of a particular server (really: set of servers, or whole fault domains/racks of servers) | ||||
Same as above | exclude * | --fdomains=FDRANKLIST --fd=FDRANKLIST --ranks=SRVRANKLIST | Remove node from DAOS system (really: set of servers, or whole fault domains/racks of servers) * alternative command names: del-nodes, del-servers? | ||||
scrub | start | Start background checksum scrubbing process (or resume after prior stop) | |||||
stop | Stop scrubbing process | ||||||
query | Report status of background checksum scrubbing process (e.g., number of corruptions found, percentage of storage scanned so far) | ||||||
pool | --pool=UUID --sys=SYSNAME --svc=SRVRANKLIST * | query | Report pool status * (applies to all pool commands) given a pool UUID and DAOS system name, eventually is expected the implementation will look up the existing pool service replica SRVRANKLIST (i.e., get rid of need for svc=) | daos_pool_query() | Y | ||
Same as above | stat * | Get pool statistics * alternative command name: get-statistics | ? | ||||
Same as above | get-prop * | Get pool properties * alternative: prop (but I like having the commands be "verbs") | ? | ||||
set-prop | TBD | Y | |||||
Same as above | get-acl * overwrite-acl update-acl delete-acl | Get/set/delete pool access control | ? | Y | |||
Same as above | get-attr set-attr del-attr list-attrs | --attr=ATTRNAME (get,del) --value=VALUESTR (set) no arguments for list-attrs | Get / set user attributes | daos_pool_attr_get() daos_pool_attr_set() daos_pool_attr_list() | |||
Same as above | list-containers list-cont | List all containers in the pool | |||||
N/A | create | --user=USERNAME@, --group=GROUPNAME@, --mode=MODE --nsvc=NREP --sys=SYSNAME --ranks=SRVRANKLIST * --scm-size=SIZE --nvme-size=SIZE --fdomains=FDRANKLIST --fd=FDRANKLIST --acl-file=FILE | * change existing dmg --target= to --ranks= | daos_pool_create() | Y | ||
Same as above | destroy | --force | daos_pool_destroy() | Y | |||
Same as above | add-storage * | --fdomains=FDRANKLIST --fd=FDRANKLIST --ranks=SRVRANKLIST | Add a storage fault domain (rack) or list of servers to an existing pool * formerly named "extend" | daos_pool_exclude() | |||
Same as above | del-storage * | --fdomains=FDRANKLIST --fd=FDRANKLIST --ranks=SRVRANKLIST | Remove a fault domain (rack) or list of servers from a pool. * formerly named "exclude" | daos_pool_extend() | |||
Same as above | add-svc | --ranks=MORESRVRANKLIST | Add a pool service replicate --svc= to specify current list of metadata service server ranks; --ranks= to specify new ranks to add to the set. | ? | |||
Same as above | del-svc | --ranks=OLDSRVRANKLIST | Remove a pool service replicate | ? | |||
Same as above | rebuild | Manage rebuild for a pool | ? | ||||
Same as above | rebalance | Trigger rebalance after add-storage(extend) by racks/servers | ? | ||||
Same as above | resize | --scm-size=SIZE --nvme-size=SIZE | Extend the size of a pool's existing targets | ||||
Same as above | evict | Evict all active pool connections | daos_pool_evict() | ||||
Same as above | lurk * | Dump activity on the pool * alternative: get-log | ? | ||||
Tool | Component | Operation | Arguments | Description and Notes / Issues | API | ||
daos | pool * | --pool=UUID --sys=SYSNAME --svc=SRVRANKLIST | query, stat, get-prop | report pool status (rebuild/rebalancing status, ...) report various stats about the pool (size, usage, number of containers, ...), same as dmg pool stats show pool properties Note: daos pool is mostly "read-only" versus "dmg pool" used by the administrator. So the set-prop command is not available here. | Y (query, get-prop). Missing statistics support for "stat" |
, but it may stay an admin/dmg thing ? | |||||||
Same as above | get-attr set-attr del-attr list-attrs | --attr=ATTRNAME (get,del) --value=VALUESTR (set) no arguments for list-attrs | Y | ||||
Same as above | list-containers list-cont | List all containers in the specified pool | Y | ||||
container cont | Pool related (same as daos pool): --pool=UUID --sys=SYSNAME --svc=SRVRANKLIST Container (choose 1): --cont=UUID OR --path=FILESYSDIR | query * | show container status query by container UUID with --cont or query by unified namespace (directory or file) --path=FSENTITY (like current duns resolve_path). Note: do not specify --pool= when querying by path. * alternative: get-status | daos_cont_query() | Y | ||
Same as above | stat | show various container statistics * alternative command name: get-statistics | ? | Missing statistics/metrics support for "stat". | |||
Same as above | get-attr set-attr del-attr list-attrs | --attr=ATTRNAME (get,del) --value=VALUESTR (set) no arguments for list-attrs | set/retrieve user attributes | Y | |||
Same as above | get-prop | * is there such a thing as getting container properties (like pool properties)? | ? | Y | |||
set-prop | TBD | Y (currently, only "label"property is supported. | |||||
Same as above | list-objects list-obj | Enumerate all objects in the specified container | Y | ||||
Pool related: same as above Container related: (--cont=UUID) OR --path=FSENTITY --type=CONTTYPE --oclass=OBJCLASS --chunk_size=NBYTES | create | (implementation generates CUUID if not specified) Also optional are --path/type/oclass/chunk_size | create a container with specific properties (including type, object class, and chunk_size if provided) and link it with the path (if provided - similar to duns link_path, create a POSIX container with DFS-specific parameters). CONTTYPE: posix, hdf5 OBJCLASS: tiny, small, large, R2S, R2, repl_max | daos_cont_create() | Y | ||
Same as query above | destroy | --force * | destroy a container based on UUID or path (unlink path as well if provided) * current dcont destroy does not have the --force option | daos_cont_destroy() | Y | ||
Same as query above | create-snap, list-snaps, destroy-snap | --snap=NAME (create) --epc=NUM (destroy) --epcrange=RANGE (destroy) | Take, list, destroy container snapshots List all snapshots in the container Destroy a single snapshot by epoch number, or all that snapshots between two epoch numbers (inclusive of the begin/end epoch numbers?). | Y | |||
Same as query above | rollback | --snap=NAME --epc=NUM | Revert a container back to a previous snapshot specified by name or epoch number. | ||||
verify * | Validate content of a POSIX container * TBD - this was in a separate "daos fs" command section that has been merged into "daos cont" | ||||||
object obj | Pool, cont related (same as daos pool and daos cont): --pool=UUID --sys=SYSNAME --svc=SRVRANKLIST --cont=UUID Object: --oid=OID | query * | TBD: Epoch? | Show the layout of a particular DAOS object including all the targets where it is distributed | daos_obj_open()? daos_obj_fetch()? | Y | |
Same as above | list-keys | daos_obj_list_dkey()? daos_obj_list_akey() | |||||
Same as above | dump | Dump content of an object |
Proposal: TODO
Container create use cases, unified namespace related additionscontainer create by pool UUID + path (container UUID generated)query by path onlyDetermine if create is to be done in "daos cont create" with more options, or have a uns or fs resource (e.g., daos uns).
Container type specification (optional: unknown if not specified)--type=fs (or --type=posix)--type=block (future: e.g., spdk, virtio for cloud use cases)--type=hdf5
Object class and chunk sizes.
...