Abstract
Large-scale electromagnetic field simulations using the FDTD (finite-difference time-domain) method require the use of GPU (graphics processing unit) clusters. However, the communication overhead caused by slow interconnections becomes a major performance bottleneck. In this paper, as a way to remove the bottleneck, we propose the 'kernel-split method' and the 'host-buffer method' which overlap computation and communication for the FDTD simulation on the GPU cluster. The host-buffer method in particular enables overlapping without any modifications to the update-kernels that are already in use. We also present theoretical formulas to predict the overlap threshold and the total throughput for each method. By using our overlap methods with 6 GPU nodes, we demonstrate that the total performance of 3D FDTD reaches 92% of a six-fold increase, which is the upper limit that would be reached if there were no communication overhead.
Original language | English |
---|---|
Pages (from-to) | 2364-2369 |
Number of pages | 6 |
Journal | Computer Physics Communications |
Volume | 183 |
Issue number | 11 |
DOIs | |
Publication status | Published - 2012 Nov |
Keywords
- CUDA
- FDTD
- GPU cluster
- OpenCL
ASJC Scopus subject areas
- Hardware and Architecture
- Physics and Astronomy(all)