ArrayTransformer bench

What is ArrayTrans

ArrayTransformer<T,rs,rm,al> (ArrayTrans) is an operator that handles trans-rank MPI copying of a ParArray<T,rs,rm,al>.

ArrayTrans is the foundation of communication during repeated ghost point updating, which is the primary communication pattern of a spacial decomposed parallel PDE solver in DNDS.

ArrayTrans Bench

Written in Python, the benchmark calls DNDS modules and builds Array<real, vdim>, where vdim is a positive size.

Code is here.

To mimic standard workload, use 3-D block (c-indexed) each process.

Each process has size of $32\times32\times32\times\text{vdim}$.

Finds neighbor ranks: upper, lower, front, back, left, right.

Sizes:

  • Volume points: $32^3=32768$
  • Face points to be pulled: $32^2\times6=6144$, which is \(18.75\%\)

Tested vdim:

  • 6
  • 20
  • 120

Pitfall Using mpirun -np X python xxx.py might be dangerous If np=64, and OMP_NUM_THREADS (or other thread number control) is not set, and something in python (like NumPy) decides to launch 64 threads, you have 4096 copies of NumPy need to load, which could cause IO system break or memory drain on some systems. See this discussion on stack overflow. On THTJ (thcp1), need to set export OMP_NUM_THREADS=1 before launch.

Machines

NameDescriptionCPU
GSgpu704Intel(R) Xeon(R) Gold 6326 16-Core x2
THTJthcp1FT 2000 ?
THTJ1cp6Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz 28-core x2 per node

Results

Bandwidth results: bytes/s (per rank / total)

Name$\text{vdim}=6$$\text{vdim}=20$$\text{vdim}=120$Best Total
GS np=165.8339e+086.4967e+086.6650e+0810.2 G
GS np=322.7550e+08 (0.8ms)3.0511e+08 (2ms)2.8825e+08 (17ms)9.3 G
THTJ np=1x561.5009e+07 (20ms)2.5144e+07 (39ms)2.2339e+07 (264ms)1.3 G
THTJ np=4x564.2981e+06 (69ms)4.9476e+06 (190ms)5.2787e+06 (1100ms)1.1 G
THTJ np=50x563.5452e+05 (831ms)3.9606e+05 (2481ms)4.0971e+05 (14400ms)1.1 G
THTJ1 np=1x561.3897e+08 (2ms)1.2472e+08 (8ms)1.2322e+08 (40 ms)7.4G
THTJ1 np=4x563.0813e+07 (9ms)2.8067e+07 (35ms)3.5052e+07 (168ms)7.4G
THTJ1 np=20*568.3952e+06 (35ms)9.3332e+06 (105ms)9.9626e+06 (592ms)10.6G
BSCC-A np=1x647.5892e+07 (4ms)7.8188e+07 (12ms)8.8352e+07 (67ms)5.4G
BSCC-A np=4x641.3691e+071.4866e+071.7969e+074.4G
BSCC-A np=20*642.7962e+063.0239e+063.2893e+064.1G
Licensed under CC BY-NC-SA 4.0
by Harry
Built with Hugo
Theme Stack designed by Jimmy