MPICH: Starting a Global MPD Ring

From Debian Clusters
Jump to: navigation, search

This is part four of a multi-part tutorial on installing and configuring MPICH2. The full tutorial includes

Before you set up a ring of root mpd daemons, make sure MPICH is working correctly on a single machine. See the MPICH without Torque Functionality page for more information.

Contents

Mpd.conf Files

You will absolutely need mpd.conf files for any users and for root on each of the worker nodes in order for this to work. If you don't already have this set up, you can follow the instructions on the MPICH without Torque Functionality page to do.

Password-Less SSH

Password-less SSH will also need to be setup for all users. See the Password-less SSH for Users page for information on how to do this.

Starting the Mpd Ring

Starting the First Node

Once you have /etc/mpd.conf in place on all of your worker nodes, an mpd daemon needs to be started on each one of the worker nodes. These will be used to manage any MPI processes. The first node started up serves as a kind of focal point for all of the other mpd's. For this reason, it's important to choose (and remember) a specific node as the head MPD node.

Start by ssh'ing into this special first node, and then running

mpd --daemon --ncpus=<# CPUs>

The --daemon part specifies that this should be run in the background, and that the process shouldn't be killed when the SSH session ends.

Next, in order to know where exactly this daemon is running, in order to have other daemons attach to it, run mpdtrace -l as shown below:

owl:~# mpdtrace -l
owl_60519 (192.168.1.202)

You'll need the value after the underscore (_): this is the random port that the daemon is waiting for communication on.

Starting the Other Nodes

Then, on the other nodes, a slightly more complicated mpd command is needed:

mpd --daemon --host=<your first host> --port=<port found with mpdtrace> --ncpus=<# CPUs>

Do this one at a time on each of the other worker nodes, or see the Cluster Time-saving Tricks page to learn how to script it up. If you have any trouble, the MPICH: Troubleshooting the MPD page might help.

Checking the MPD Ring

Once you've started up an mpd daemon on each one of the worker nodes, ssh into one of the worker nodes and run

mpdtrace

This will show you all of the hosts currently hooked up as part of the ring. All of the worker nodes should be listed here. To get a quick count, run

mpdtrace | wc -l

If any are missing, those nodes should be further investigated and attempt made again to start up an mpd daemon on them.

Sanity Check: Running an MPI Program on Multiple Nodes

After the ring has been set up, it's finally time to try running an MPI job on multiple nodes. SSH into one of the worker nodes, become one of your user accounts, and follow the instructions at Creating and Compiling an MPI Program.

As when running multiple processes on the same machine, run the program with mpiexec. First, specify a number of processes smaller than or equal to the number of cpus you specified for this worker node. In my case, that's four or less.

mpiexec -np 4 ./hello.out

You should see the same hostname listed for all of the processes. This is because the mpd daemon will use all available CPUs on the host you're running on before branching out to CPUs on other hosts. To see this spread further than just one machine, ramp up the number of processes to higher than the number of cpus on this host.

mpiexec -np 7 ./hello.out

You should now be seeing different hostnames appearing in the list. The mpd on this machine automatically contacts other mpds in the ring when the host it's running on runs out of CPUs. (In MPICH1, you would have needed to specify this with a machinefile.) Pretty cool!

Personal tools
Namespaces

Variants
Actions
About
services
node images
clustering
web monitoring
Tools