Accelerating CMIP data analysis with parallel computing in R
In this Technical Note we examine eight schemes for parallelizing Extreme Value Analysis (EVA) on Coupled Model Intercomparison Project data via R foreach, doParallel, and doMPI packages. We perform strong scaling studies to delineate the performance impacts of factors such as R cluster type (TCP/IP sockets and MPI), communication protocol (Ethernet, IP over InfiniBand, and MPI), loop parallelization (outer or inner loop), and approaches to reading data from the NCAR GLADE parallel filesystem. We elucidate peculiarities of R memory management and overhead associated with interprocess communication and discuss broadcast limitations of Rmpi. The best performing scheme parallelizes the outer EVA loop across latitude and reads only the subset of the data operated on in the inner loop over longitude; the different cluster types and communication protocols all perform about equally for this scheme. This configuration represents a parallel speedup of 50 with 96 R workers, and is scalable for EVA on larger problem sizes than those presented here.
document
http://n2t.net/ark:/85065/d7z89fpb
eng
geoscientificInformation
Text
publication
2016-01-01T00:00:00Z
EARTH SCIENCE SERVICES > DATA ANALYSIS AND VISUALIZATION > STATISTICAL APPLICATIONS
EARTH SCIENCE SERVICES > MODELS > COUPLED CLIMATE MODELS
revision
2021-09-17
publication
2017-06-30T00:00:00Z
Copyright Author(s). This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
None
OpenSky Support
UCAR/NCAR - Library
PO Box 3000
Boulder
80307-3000
name: homepage
pointOfContact
OpenSky Support
UCAR/NCAR - Library
PO Box 3000
Boulder
80307-3000
name: homepage
pointOfContact
2023-08-18T18:06:48.405376