I was recently asked how Ampere GPUs compare to CPU, I have previously given some performance numbers, but decided to make more of a benchmark case for comparison.

I ran a 40^3 hex mesh with p=3, which gives about 4.1 million solution points. The case I used was of course the TGV and below is the ini file. Given that we aren’t interested in the physics here, I only ran it to t=5.

These results are using PyFR v 1.10, on a single A100, using CUDA 11.0.2 and GCC 9.2.0.

Single | Double | |
---|---|---|

Runtime [s] | 2069.735 | 3183.357 |

DoF | (40\times4)^3 | (40\times4)^3 |

RHS | 2\times10^5 | 2\times10^5 |

No. GPUs | 1 | 1 |

s/DoF/RHS/GPU | 2.53\times10^{-9} | 3.89\times10^{-9} |

Below is the ini file for double precision and a link to the mesh.

If anyone else wants to run the case for comparison I would be interested to see the performance numbers, I only really have access to GPU orientated machines.

```
[backend]
precision = double
rank-allocator = linear
[backend-cuda]
device-id = local-rank
# gimmik-max-nnz is not required in later pyfr versions
gimmik-max-nnz = 512
mpi-type = cuda-aware
block-1d = 64
block-2d = 128
[constants]
gamma = 1.4
mu = 6.25e-4
Pr = 0.71
Ps = 111.607
[solver]
system = navier-stokes
order = 3
anti-alias = none
viscosity-correction = none
shock-capturing = none
[solver-time-integrator]
scheme = rk4
controller = none
tstart = 0
tend = 5.01
dt = 1e-4
[solver-interfaces]
riemann-solver = rusanov
# These ldg settings will hit the interface communication a little harder
ldg-beta = 0.0
ldg-tau = 0.0
[solver-interfaces-quad]
flux-pts = gauss-legendre
quad-deg = 6
quad-pts = gauss-legendre
[solver-elements-hex]
soln-pts = gauss-legendre
quad-deg = 6
quad-pts = gauss-legendre
[soln-plugin-nancheck]
nsteps = 10
[soln-plugin-integrate]
nsteps = 50
file = integral_fp64.csv
header = true
vor1 = (grad_w_y - grad_v_z)
vor2 = (grad_u_z - grad_w_x)
vor3 = (grad_v_x - grad_u_y)
int-E = rho*(u*u + v*v + w*w)
int-enst = rho*(%(vor1)s*%(vor1)s + %(vor2)s*%(vor2)s + %(vor3)s*%(vor3)s)
[soln-plugin-writer]
dt-out = 5
# depending on the environment you might want to change this
basedir = .
basename = nse_fp64_tgv_3d_p3-{t:.2f}
[soln-ics]
rho = 1
u = sin(x)*cos(y)*cos(z)
v = -cos(x)*sin(y)*cos(z)
w = 0
p = Ps + (1.0/16.0)*(cos(2*x) + cos(2*y))*(cos(2*z + 2))
```