项目作者: dc-fukuoka

项目描述 :
jacobi - a benchmark by solving 2D laplace equation with jacobi iterative method. GPU or Xeon Phi can be used.
高级语言: Fortran
项目地址: git://github.com/dc-fukuoka/jacobi.git
创建时间: 2017-08-28T23:34:57Z
项目社区:https://github.com/dc-fukuoka/jacobi

开源协议:

下载


jacobi - a benchmark by solving 2D laplace equation with jacobi iterative method.

  1. GPU or Xeon Phi can be used.

============
how to run:

  1. # for intel (intel compiler and intelmpi are required)
  2. $ make mic
  3. # for GPU (PGI compiler and its MPI are required)
  4. $ make gpu
  5. $ cat fort.18
  6. 10000 ! iter_max
  7. 4096 4096 ! m, n
  8. 1 2 ! np_m, np_n
  9. $ vi fort.18 <-- adjust the paramters
  10. $ mpirun -np $NP ./jacobi_mpi.intel # for Xeon phi
  11. or
  12. $ mpirun -np $NP ./jacobi_acc_mpi.pgi # for GPU
  13. # where $NP = np_m * np_n
  14. # if you do not have GPU nor Xeon Phi,
  15. $ mpirun -np $NP ./jacobi_mpi.intel.nooffload
  16. or
  17. $ mpirun -np $NP ./jacobi_mpi.pgi
  18. # to view the result
  19. $ gnuplot # splot "jacobi.dat" w l

performance comparison:

iter_max=10000
m=4096, n=4096

with Xeon E5-2680(8 cores/socket, 2 sockets) + Xeon Phi 5110P x2, 2 nodes (np_m=1, np_n=4):

only CPU[s]: 66.7373
CPU + MIC[s]: 29.1181

with Xeon E5-2680 v4(14 cores/socket, 2 sockets) + NVIDIA Tesla P100 x4, 2 nodes (np_m=1, np_n=8):

only CPU[s]: 94.7290
CPU + GPU[s]: 5.29180

the following is an example of calculation result.

m=1024 n=1024
with GPU:

  1. $ mpirun -x LD_LIBRARY_PATH -np 8 -npernode 4 -bind-to socket ./jacobi_acc_mpi.pgi
  2. iter_max: 10000
  3. m: 4096 n: 4096
  4. np: 8
  5. np_m: 1 np_n: 8
  6. 0 0.499878
  7. 1000 0.000357
  8. 2000 0.000178
  9. 3000 0.000118
  10. 4000 0.000088
  11. 5000 0.000071
  12. 6000 0.000059
  13. 7000 0.000050
  14. 8000 0.000044
  15. 9000 0.000039
  16. 10000 0.000035
  17. Total CPU time: 5.29180
  18. Device init : 1.91826
  19. Setup problem : 0.841110E-01
  20. Data copyin : 0.291271E-01
  21. Computation : 3.13129
  22. Data copyout : 0.277512E-01
  23. Output : 0.101262

Alt text

with Xeon Phi:

  1. $ srun --mpi=pmi2 -N2 -n4 -c8 --cpu_bind=cores -t0:05:00 ./jacobi_mpi.intel
  2. iter_max: 10000
  3. m: 1024 n: 1024
  4. np: 4
  5. np_m: 1 np_n: 4
  6. Number of threads = 236
  7. 0 0.499512
  8. 1000 0.000347
  9. 2000 0.000171
  10. 3000 0.000113
  11. 4000 0.000084
  12. 5000 0.000066
  13. 6000 0.000055
  14. 7000 0.000047
  15. 8000 0.000041
  16. 9000 0.000036
  17. 10000 0.000032
  18. Total CPU time: 6.98289
  19. Device init : 0.400000E-05
  20. Setup problem : 0.640400E-02
  21. Data copyin : 0.208880E-01
  22. Computation : 6.82102
  23. Data copyout : 0.292500E-02
  24. Output : 0.131650

Alt text