TY - JOUR
T1 - Evaluation of pseudo-random number generation on GPU cards
AU - Askar, Tair
AU - Shukirgaliyev, Bekdaulet
AU - Lukac, Martin
AU - Abdikamalov, Ernazar
N1 - Funding Information:
Acknowledgments: This research has been funded by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (AP08856149, BR10965141).
Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2021/12
Y1 - 2021/12
N2 - Monte Carlo methods rely on sequences of random numbers to obtain solutions to many problems in science and engineering. In this work, we evaluate the performance of different pseudo-random number generators (PRNGs) of the Curand library on a number of modern Nvidia GPU cards. As a numerical test, we generate pseudo-random number (PRN) sequences and obtain non-uniform distributions using the acceptance-rejection method. We consider GPU, CPU, and hybrid CPU/GPU implementations. For the GPU, we additionally consider two different implementations using the host and device application programming interfaces (API). We study how the performance depends on implementation parameters, including the number of threads per block and the number of blocks per streaming multiprocessor. To achieve the fastest performance, one has to minimize the time consumed by PRNG seed setup and state update. The duration of seed setup time increases with the number of threads, while PRNG state update decreases. Hence, the fastest performance is achieved by the optimal balance of these opposing effects.
AB - Monte Carlo methods rely on sequences of random numbers to obtain solutions to many problems in science and engineering. In this work, we evaluate the performance of different pseudo-random number generators (PRNGs) of the Curand library on a number of modern Nvidia GPU cards. As a numerical test, we generate pseudo-random number (PRN) sequences and obtain non-uniform distributions using the acceptance-rejection method. We consider GPU, CPU, and hybrid CPU/GPU implementations. For the GPU, we additionally consider two different implementations using the host and device application programming interfaces (API). We study how the performance depends on implementation parameters, including the number of threads per block and the number of blocks per streaming multiprocessor. To achieve the fastest performance, one has to minimize the time consumed by PRNG seed setup and state update. The duration of seed setup time increases with the number of threads, while PRNG state update decreases. Hence, the fastest performance is achieved by the optimal balance of these opposing effects.
KW - CUDA
KW - Curand
KW - GPU
KW - PRNG
UR - http://www.scopus.com/inward/record.url?scp=85121831595&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121831595&partnerID=8YFLogxK
U2 - 10.3390/computation9120142
DO - 10.3390/computation9120142
M3 - Article
AN - SCOPUS:85121831595
SN - 2079-3197
VL - 9
JO - Computation
JF - Computation
IS - 12
M1 - 142
ER -