Parallel¶
The core features of MRCPP are parallelized using a shared memory model only
(OpenMP). This means that there is no intrinsic MPI parallelization (e.i. no
data distribution across machines) within the library routines. However, the
code comes with a small set of features that facilitate MPI work and data
distribution in the host program, in the sense that entire FunctionTree
objects can be located on different machines and communicated between them.
Also, a FunctionTree
can be shared between several MPI processes
that are located on the same machine. This means that several processes have
read access to the same FunctionTree
, thus reducing the memory footprint,
as well as the need for communication.
The MPI features are available by including:
#include "MRCPP/Parallel"
The host program¶
In order to utilize the MPI features of MRCPP, the MPI instance must be initialized (and finalized) by the host program, as usual:
MPI_Init(&argc, &argv);
int size, rank;
MPI_Comm_size(MPI_COMM_WORLD, &size); // Get MPI world size
MPI_Comm_rank(MPI_COMM_WORLD, &rank); // Get MPI world rank
MPI_Finalize();
For the shared memory features we must make sure that the ranks within a communicator is actually located on the same machine. When running on distributed architectures this can be achieved by creating separate communicators for each physical machine, e.g. to split MPI_COMM_WORLD into a new communicator group called MPI_COMM_SHARED that share the same physical memory space:
// Initialize a new communicator called MPI_COMM_SHARE
MPI_Comm MPI_COMM_SHARE;
// Split MPI_COMM_WORLD into sub groups and assign to MPI_COMM_SHARE
MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL, &MPI_COMM_SHARE);
Note that the main purpose of the shared memory feature of MRCPP is to avoid
memory duplication and reduce the memory footprint, it will not
automatically provide any work sharing parallelization for the construction of
the shared FunctionTree
.
Blocking communication¶
Warning
doxygenfunction: Unable to resolve function “mrcpp::send_tree” with arguments (FunctionTree<D>&, int, int, MPI_Comm, int) in doxygen xml output for project “MRCPP” from directory: _build/xml. Potential matches:
- template<int D> void send_tree(FunctionTree<D> &tree, int dst, int tag, mrcpp::mpi_comm comm, int nChunks, bool coeff)
Warning
doxygenfunction: Unable to resolve function “mrcpp::recv_tree” with arguments (FunctionTree<D>&, int, int, MPI_Comm, int) in doxygen xml output for project “MRCPP” from directory: _build/xml. Potential matches:
- template<int D> void recv_tree(FunctionTree<D> &tree, int src, int tag, mrcpp::mpi_comm comm, int nChunks, bool coeff)
Example¶
A blocking send/receive means that the function call does not return until the communication is completed. This is a simple and safe option, but can lead to significant overhead if the communicating MPI processes are not synchronized.
mrcpp::FunctionTree<3> tree(MRA);
// At this point tree is uninitialized on both rank 0 and 1
// Only rank 0 projects the function
if (rank == 0) mrcpp::project(prec, tree, func);
// At this point tree is projected on rank 0 but still uninitialized on rank 1
// Sending tree from rank 0 to rank 1
int tag = 111111; // Unique tag for each communication
int src=0, dst=1; // Source and destination ranks
if (rank == src) mrcpp::send_tree(tree, dst, tag, MPI_COMM_WORLD);
if (rank == dst) mrcpp::revc_tree(tree, src, tag, MPI_COMM_WORLD);
// At this point tree is projected on both rank 0 and 1
// Rank 0 clear the tree
if (rank == 0) mrcpp::clear(tree);
// At this point tree is uninitialized on rank 0 but still projected on rank 1