Request from H3C
Submitter: @Stephen Pu from H3C
Reviewer: @Johann Lombardi (Deactivated) @Liang Zhen and other team members from Daos
Status: Need more specification and under review
Expected result:
Request item scope is defined for Daos and H3C collaboration before Q4.2022.
Scope may divide into 2~3 sub iterations.
1. Request priority defined and aligned.
2. Request specified for Product definition.(NOT design stage)
3. General feasibility and estimation could be given.
NAME | Name | Scenario | Description | H3C Proposal | Daos feedback | Priority | If contributed to community | Feasibility and effort estimation | Owner(i.e. who own design who own dev and testing) | Delivery plan (Q1, Q2, Q3 in 2022) | Risk /Comments |
---|---|---|---|---|---|---|---|---|---|---|---|
volume | create |
| User can create a block storage volume by specific size, name and others attributs |
| Low. |
|
|
|
|
|
|
delete |
| User can delete a block storage volume by specific volume name or uuid |
|
|
|
|
|
|
|
| |
expand | online, offline | The volume size can be expanded in the IO operation service of online or offline cases. |
|
|
|
|
|
|
|
| |
search |
| The specific volume could be searched by their volume name or uuid |
|
|
|
|
|
|
|
| |
thin provisioning vol |
| The volume presented appears to be the full provisioned capacity to the application servers, but nothing has been allocated until write operations occur. |
|
|
|
|
|
|
|
| |
thick provisioning vol |
| With thick provisioning, the complete amount of virtual disk storage capacity is pre-allocated on the physical storage when the virtual disk is created. A thick-provisioned virtual disk consumes all the space allocated to it in the datastore |
|
|
|
|
|
|
|
| |
modify |
| The volume's attributes can by modified, like: name |
|
|
|
|
|
|
|
| |
recycle bin |
| Recycle Bin is a snapshot recovery feature that enables you to restore accidentally deleted |
|
|
|
|
|
|
|
| |
snapshoot |
| You can back up the data on your volumes by taking point-in-time snapshots. |
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
| |||||
clone |
| A clone of a Block Storage volume is a copy made of an existing volume at a specific moment in time. |
|
|
|
|
|
|
|
| |
QoS |
| You can use quality of service (QoS) to guarantee that performance of critical workloads is not degraded |
|
|
|
|
|
|
|
| |
block | NVMe-oF |
| Use NVMe over Fabric protocol to setup a NVMe block storage. |
|
|
| yes |
|
|
|
|
system | deployment | Automatic cluster deployment |
| YES |
|
| Co-work with daos ansible https://github.com/bensallen/ansible-role-daos |
|
|
|
|
installation | Automatic cluster installation |
| YES |
|
| yes |
|
|
|
| |
rollback |
| YES |
|
|
|
|
|
|
| ||
web potal |
|
| YES |
|
|
|
|
|
|
| |
upgrade | online upgrade |
| YES |
|
|
|
|
|
|
| |
offline upgrade |
| YES |
|
|
|
|
|
|
| ||
rollback |
| YES |
|
|
|
|
|
|
| ||
system | online monitoring |
| YES |
|
| yes |
|
|
|
| |
offline monitoring |
| YES |
|
| yes |
|
|
|
| ||
alarm |
|
| YES |
|
| yes |
|
|
|
| |
Log |
|
| YES |
|
|
|
|
|
|
| |
web page |
|
| YES |
|
|
|
|
|
|
| |
pool | Pool | block storage pool | The user can create a specific block storage pool with size or name. |
|
|
| yes |
|
|
|
|
Pool expand | online | The specific block storage pool can be expand its size by no interrupt with their operation (online case) |
|
|
| yes |
|
|
|
| |
cluster | Node server | Add new node (online) | The cluster can add a new node server to existing cluster system without interrupt cluster's operation (online case) |
| Yes.
|
| yes |
|
|
|
|
delete a node (online) | The cluster can delete a new node server to existing cluster system without interrupt cluster's operation (online case) |
| Manual Drain and exclude. no impact on workload(online) |
|
|
|
|
|
| ||
SSD Disk | Add a new SSD (online) | The node server can add a new NVMe SSD without interrupt cluster's operation (Online case) |
| under discussion High complexity |
| yes | Big Effort |
|
|
| |
Replace a SSD (online) | The node server can replace a new NVMe SSD with original one. (online case) |
| yes. (bug fix needed from h3c) |
|
|
|
|
|
| ||
remove exist SSD (online) | The node server can remove a running or failed NVMe SSD in system. (Online case) |
| yes. |
|
|
|
|
|
| ||
| PMEM Disk | Add a new PMEM (offline) | The node server can add a new PMEM without interupt cluster's operation (Online case) Pmem capacity expanding. i.e. First config is 256GB * 12 config but due to adding new nvme disk or improve performance by caching more data into pmem, then expand to 512GB * 12. |
| 2023. can put ahead. manual dump out pmem metadata and replace pmem and reboot. reformat pmem and recover data. |
|
|
|
|
|
|
| Replace a PMEM (online) low priority as PMEM long lifecycle | The node server can replace a new PMEM with original one. (online case) |
| Low. | Low priority duet to long-life assurance of pmem |
|
|
|
|
| |
| remove exist PMEM (online) Invalid requirment | The node server can remove a running or failed PMEM in system. (Online case) |
|
| Invalid |
|
|
|
|
| |
cluster | Network | Network exception robust | The cluster could keep providing IO operation service in the random network failure case. |
| By Design. need test and bug fix |
| yes |
|
|
|
|
NIC exception robust | The cluster could keep providing IO operation service in the random node's NIC failure case. |
| No bonding support so far. No plan. |
| yes |
|
|
|
| ||
SSD Disk Exception |
| SSD Disk Exception is the test driven case |
| Framework ready. need further improvement. |
|
|
|
|
|
| |
Node Exception |
| Node Exception is the test driven case |
| partial support. Need node level telemetry and monitor. Graphinna** |
| yes. |
|
|
|
| |
Disk Usage Optimization | PMEM space optimization | Currently, the PMEM size is divided with the amount of NVMe SSD, it caused a big PMEM space waste. |
| Compatibility issue. no plan. need further discussion. |
| yes |
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Data Rebuild | 4TB/H | The data rebuild speed above 4TB/H, in the cluster of 3 nodes each node has 8 NVMe SSD. |
| Qos not ready snf interface. need H3C test and perf improvement. |
|
|
|
|
|
| |
Node reboot | IO recovery time | When a node rebooted, how many seconds the IO service could be recovery back to the 100% before it's reboot. |
| TBD by H3C. |
|
|
|
|
|
|