...
Store RAS events persistently in the mgmt service DB.
Expose stored events via API.
Add dmg command to show and filter events
Filters by
since
andtype
should be enough for most of the cases, e.g.dmg system ras --since="26-10-2024T10:00:000"
The output should support human readable and machine digestible formats. The regular tab formatted table for default output and json if
--json
flag provided.
If time allows extend RAS events with:
pool creation/deletion
container creation/deletion
pool property changes
container property changes
pool extension/drain
new rank joined
Retention policy - large systems can generate tremendous amount of event. The basic retention policy should clean up old events (say 30 days by default) to ensure mgmt service have available storage.
Optionally, there could be CLI flag to clear events after read (for the systems that read events and store them in their internal storage).
It can be implemented in the next release. The small system of 20 servers running for 6 months generated ~370k RAS events, even if every event is 1Kb it leave plenty of headspace before implementing retention policy.
Design Overview
The overall idea is to re-use existing infrastructure of raising events and mgmt service DB: when an event is raised, in addition to adding it to the log, it’s also sent to MS to be stored in the DB.
...