Erasure_Code_Performance (Wolf vs Frontera)

This page is to Track the Initial Performance Number for Erasure Code comparison with Wolf/Frontera.

Wolf-Cluster:

  • 4-18 severs
    • DCPM only (no SSD)
    • 16 targets per server
    • OFI+psm2
    • 2 daos_io_server pre physical node
  • 10 clients
    • OFI+psm2
  • IOR (FPP)
    • 200 ranks
    • -v -w -r -i 2 -k

Frontera Cluster:

  • 4-18 severs
    • tmpfs only (no SSD)
    • 16 targets per server
    • ofi+verbs;ofi_rxm
    • 1 daos_io_server per physical node
  • 10-36 clients
    • ofi+verbs;ofi_rxm
  • IOR (FPP)
    • 200 – 720 ranks
    • -w -W -r -R -g -G 27 -C -Q 1 -F -i 2 -s 1
  • Aggregation Disabled for all tests


1

Servers

Clients

IOR Process

Aggregation Disabled

ChunkSize

BlockSize

XferSize

Strip

access

Object Class

Write (MB/Sec)

 Read (MB/Sec)

Date

Request

Notes

2

4

10

200


1M

32M

1M

 ------

FPP

RP_1G2

20628.51

41909.55




3

4

10

200


1M

32M

1M

 ------

FPP

RP_3G2

5079.58

43342.75




4

4

10

200


32M

32M

2M

FULL

FPP

EC_2P1G1

14021.35

40296.71



Need to add export FI_PSM2_CONN_TIMEOUT=30 at client-side to work for most FPP

548256YES

32M

32M

2M

FULL

FPP

EC_2P1G1

27912.6446351.81
FronteraIOR_Console.txt  
6410200YES

32M

32M

2M

FULL

FPP

EC_2P1G1

28046.7543065.39
FronteraIOR_Console.txt 
7

4

8

256

No

32M

32M

2M

FULL

FPP

EC_2P1G1

25060.42

37088.85

5/12/2021

Frontera


8














9

6

10

200


1M

32M

1M

 ------

FPP

RP_3G4

7188.65

60362.96




10

6

10

200


32M

32M

4M

FULL

FPP

EC_4P1G1

25080.65

61916.86




11

6

12

240

YES

32M

32M

4M

FULL

FPP

EC_4P1G1

47124.6264640.02
FronteraIOR_Console.txt 
12
















13

10

10

200


1M

32M

8M

 ------

FPP

 RP_1G8

41363.64

60083.43




14

10

10

200


32M

32M

8M

FULL

FPP

EC_8P2G1

35246.91

78044.88




15

10

20

400

YES

32M

32M

8M

FULL

FPP

EC_8P2G1

70238.98101927.74
FronteraIOR_Console.txt
16

10

10

200

YES

32M

32M

8M

FULL

FPP

DAOS_OC_EC_K8P2_L64K 

13249.46

61664.51




17

10

20

400

YES

32M

32M

8M

FULL

FPP

DAOS_OC_EC_K8P2_L64K

68384.4093717.37
FronteraIOR_Console.txt
18
















19

18

10

200


1M

32M

1M

 ------

FPP

 RP_1G16

78331.87

117012.32

11/3/2020


Latest master d73374cb6cef61b830bd030a2b5d85791342d2d0 IOR_Console.txt

20

18

36

720

YES

1M

32M

1M

-----

FPP

RP_1G16

149058.91171154.21
FronteraIOR_Console.txt
21

18

10

200


32M

32M

16M

FULL

FPP

EC_16P2G1

17129.21

123063.79

11/3/2020


Latest master d73374cb6cef61b830bd030a2b5d85791342d2d0
Write is same compare to RP_1G16 so not going to open defect

22

18

36

720

YES

32M

32M

16M

FULL

FPP

EC_16P2G1

107520.52179240.30
FronteraIOR_Console.txt
23

18

10

200

YES

32M

32M

16M

FULL

FPP

DAOS_OC_EC_K16P2_L64K 

14874.25

83421.25

10/19/2020



24

18

36

720

YES

32M

32M

16M

FULL

FPP

DAOS_OC_EC_K16P2_L64K 

97084.61147155.30
FronteraIOR_Console.txt
25

18

10

200

YES

32M

32M

16M

FULL

FPP

DAOS_OC_EC_K16P2_L128K 

16220.05

123331.59

10/20/2020


IOR_Log.txt

26

18

36

720

YES

32M

32M

16M

FULL

FPP

DAOS_OC_EC_K16P2_L128K 

108867.57163780.23
FronteraIOR_Console.txt
27

18

10

200

YES

32M

128M

16M

FULL

FPP

DAOS_OC_EC_K16P2_L256K 





Open new issue just to be sure it's not some thing in DAOS or CART side 

DAOS-5895 - Getting issue details... STATUS ] - IOR EC object type DAOS_OC_EC_K16P2_L256K IOR with FPP is crashing the server Open

28

18

36

720

YES

32M

128M

16M

FULL

FPP

DAOS_OC_EC_K16P2_L256K 

146009.29191612.73
FronteraIOR_Console.txt
29

18

10

200

YES

32M

128M

16M

FULL

FPP

DAOS_OC_EC_K16P2_L512K 

40705.93

127393.64

10/21/2020


With higher 128M Blocksize IOR_log.txt

30

18

36

720

YES

32M

128M

16M

FULL

FPP

DAOS_OC_EC_K16P2_L512K 

160802.90193756.27
FronteraIOR_Console.txt
31

18

36

720

YES

32M

256M

16M

FULL

FPP

DAOS_OC_EC_K16P2_L512K 

149744.54195394.01
FronteraIOR_Console.txt With dedup:memcmp
32

18

36

720

YES

32M

512M

16M

FULL

FPP

DAOS_OC_EC_K16P2_L512K 

175131.70183141.63
Frontera

IOR_Console.txt With dedup:memcmp

Defect Status:

  • DAOS-5888 - EC Write performance EC_16P2G1 dropped to 50% on latest master compare to few weeks older 70b49b97ca40d596a0c98f28684378b159fdd66a
    • Not on Frontera because of non PSM2 fabric
  • DAOS-5895 - IOR EC object type DAOS_OC_EC_K16P2_L256K IOR with FPP is crashing the server
    • Did not observed any crash on Frontera
  • DAOS-5777 - EC IOR test is failing for file-per-process which has higher Chunk size
    • Did not tried on Frontera but will likely go away ?
  • New Defect on Frontera (18 servers/80 client [1600 tasks]) where read is getting stuck (But no server crash or client crash). Need to try few more things to get more debug info
    • object ERR src/object/cli_shard.c:552 dc_rw_cb() RPC 1 failed: -1032