MPICH: Parallel Programming
From Debian Clusters
The Message Passing Interface, or MPI, is a common paradigm for implementing parallel programming through message passing. This means that multiple processes can run on multiple machines or on the same machine and communicate by sending small packets of information ("messages") amongst themselves to coordinate their efforts. MPI itself is a standard for the message passing library. MPI in itself isn't an implementation of messaging passing; it's a set of functions and abilities that any implementation of the message passing library must follow. Further, there multiple versions of the standard, making it even more confusing.
However, having one (or more) implementations of MPI on a cluster is often crucial. Your users can use the MPI standard and then compile their programs by linking in one of the implementations, thereby allowing them to write software that runs in parallel. Further, many scientific software packages depend upon an implementation of MPI in order to support running them in parallel. (A few notable examples are gromacs, NAMD, and mpiBLAST.)
There are many different implementations of the MPI standard. The most common ones in use on clusters are MPICH, LAM/MPI, and Open MPI. I'll be using MPICH in all of my tutorials, but there's a short description on the differences below. Then we'll walk through
- Installing MPICH
- MPICH: Pick Your Paradigm
- MPICH with Torque Functionality OR MPICH without Torque Functionality
MPICH versus LAM/MPI versus Open MPI
MPICH, or Message Passing Interface Chameleon, is a project out of Argonne National Laboratory and Mississippi State University, originally headed by William Gropp and Ewing Lusk. With MPICH, users simply compile their code with the MPICH libraries, then run their code using an MPICH command. These libraries are typically available only for C/C++ and Fortran. MPICH2 takes a different route, more similar to LAM/MPI (see below).
LAM/MPI, or Local Area Multicomputer/Message Passing Interface, was developed by the Ohio Supercomputing Center. Unlike MPICH1, it requires users to first boot up a LAM environment before running their code. This involved starting a daemon on each one of the nodes to be used. Then, when a user is finished, s/he shuts down the LAM environment. According to the LAM/MPI website, most of the maintainers for the LAM/MPI project are now switching over to Open MPI, a new(er) conglomeration of several MPI implementations.
MPICH2 takes on similar characteristics to LAM/MPI, in that it can either run as MPICH1 did, or an MPI daemon (called an mpd) can be started on each of the nodes. Unlike in LAM/MPI, if this daemon is started by root, users' profiles can be configured to have user programs attach to and run as the root daemon, meaning they do not need to start up and tear down an environment before running a parallel program. I'll be using MPICH2 in all of the tutorials.
References
- Sloan, Joseph D.. High Performance Linux Clusters. 1st ed. Sebastopol, CA: O'Reilly Media, Inc., 2005.
- http://www-unix.mcs.anl.gov/mpi/mpich/
- Wikipedia: Message Passing Interface
- MPICH2 User's Guide

