Test dmg scalability on TDS

Description

The intent is to evaluate dmg and the control plane scalability. For this, we need just one client node and as many servers as possible. Let’s start with the TDS and then try to scale beyond on Aurora. It would be great to run this test with the CXI provider and the 2.2 image.

for N in (1, 2, 4, 8, 16, 32)

Measure time to format N engines
Measure time to run dmg system query
Measure time to create one pool spanning all the engines and using the full capacity
Measure time of dmg pool list
Measure time of dmg pool query
Measure time to destroy the pool
Measure time to create 50 pools spanning all the engines with each pool using a 1/50th of the capacity
Measure time of dmg pool list
Measure time of dmg pool query for one of those pool
Measure time to collect the metrics from one engine with dmg
Measure time to destroy all the pools
Measure time to stop the system with dmg system stop
Measure time to start the system with dmg system start

Activity

Show:

Kurniawan Alfizah May 18, 2022 at 5:49 PM

Hi Michael, wrt your question on DAOS-10508, about metric-50. We can say it is the time to collect metrics via dmg for all of those 50 pools. Basically it's running 'dmg -i telem metrics query' while we have 50 pools compare to run the same command after we destroy all those pools. (metrics).

Michael:
ok, got it. thanks. so it makes sense that it takes longer because there are more metrics. maybe some room for improvement there, will think about it.

Mjmac Macdonald May 10, 2022 at 2:59 PM

Wow, yeah. Very nice improvement in format times. Nice work on that, !

Thanks for re-running, .

The times all seem reasonable to me, although I’m wondering what the “metrics - 50” times represent, exactly. Is that the time to collect metrics via dmg for each of the 50 pools?

Kurniawan Alfizah May 10, 2022 at 2:34 PM
Edited

Running with PR-8603 as it’s been included in the latest image on Sunspot (Thanks Maureen). And here are the results, looking good.

N	1	2	4	8	8 - 3 AP	16	16 - 3AP	32	32 - 3AP
format engines	18.875	33.006	33.766	33.578	33.382	33.452	33.91	33.894	33.791
system query	0.098	0.102	0.125	0.113	1.388	0.164	0.142	0.137	0.137
one pool create	4.672	4.691	4.765	4.461	5.26	4.801	4.847	4.888	4.923
pool list	0.112	0.126	0.129	0.119	0.391	0.394	0.147	0.374	0.373
pool query	0.089	0.099	0.098	0.111	0.368	0.362	0.11	0.346	0.346
pool destroy	1.774	1.847	1.856	2.099	2.689	1.892	1.812	2.642	2.399
50 pools create	82.14	82.906	84.88	82.45	87.016	84.016	88.044	86.67	89.34
pool list	0.951	0.931	8.733	8.435	9.499	9.04	9.747	11.06	8.55
pool query	0.099	0.113	0.347	0.352	0.359	0.108	0.111	0.113	0.359
metrics - 50	69.903	142.015	163.281	163.15	162.268	162.41	163.161	159.965	161.69
pool destroy all	36.187	36.11	61.143	59.93	62.691	60.168	66.216	65.477	57.97
metrics	1.802	3.453	3.322	3.492	3.45	3.415	3.491	3.549	3.442
dmg sys stop	1.621	1.61	1.628	1.615	1.622	1.611	1.624	1.621	1.627
dmg sys start	16.12	16.63	16.62	16.62	16.63	16.63	16.14	16.62	16.63

Mjmac Macdonald May 9, 2022 at 1:58 PM

That PR to improve NVMe storage format times has now been landed to release/2.2.

Mjmac Macdonald May 8, 2022 at 2:42 PM

It does seem to scale pretty well; that’s nice to see. I think we should redo the storage format test once lands on the release/2.2 branch – that should improve the time significantly.

Done

Details
Assignee
Kurniawan Alfizah(Deactivated)
Reporter
Sylvia Oi Yee Chan
Priority
P2-High
Affects versions
Phase1A Test
Required for Version
Phase1A Test
Fix versions
Phase1A Test
Components
Story Points
3

Created May 3, 2022 at 9:23 PM

Updated March 6, 2024 at 11:18 PM

Resolved May 18, 2022 at 5:49 PM

Test dmg scalability on TDS

Description

Activity

Kurniawan Alfizah May 18, 2022 at 5:49 PM

Mjmac Macdonald May 10, 2022 at 2:59 PM

Kurniawan Alfizah May 10, 2022 at 2:34 PMEdited

Mjmac Macdonald May 9, 2022 at 1:58 PM

Mjmac Macdonald May 8, 2022 at 2:42 PM

DetailsAssigneeKurniawan AlfizahKurniawan Alfizah(Deactivated)ReporterSylvia Oi Yee ChanSylvia Oi Yee ChanPriorityP2-HighAffects versionsPhase1A TestRequired for VersionPhase1A TestFix versionsPhase1A TestComponentsStory Points3

Details

Assignee

Reporter

Priority

Affects versions

Required for Version

Fix versions

Components

Story Points

Kurniawan Alfizah May 10, 2022 at 2:34 PM
Edited

Details
Assignee
Kurniawan Alfizah(Deactivated)
Reporter
Sylvia Oi Yee Chan
Priority
P2-High
Affects versions
Phase1A Test
Required for Version
Phase1A Test
Fix versions
Phase1A Test
Components
Story Points
3