Debian Clusters for Education and Research: The Missing Manual

Torque and Maui Sanity Check: Submitting a Job

From Debian Clusters

Jump to: navigation, search

This is the last part of a four part tutorial on installing and configuring a queuing system and scheduler. The full tutorial includes:

There is also a troubleshooting page:

This part tutorial assumes you have already installed and configured Torque and Maui. If you haven't, you'll want to visit those pages first.

Contents

Torque/Maui Sanity Check: Submitting a Job

A job is one particular instance of running a particular script or program of code. You won't want to run a job as root, so first, on your head node, become one of your users. (For instance, su - kwanous.)

Jobs are submitted to the job queue run by torque, which maui monitors and will then schedule, and torque will tell the pbs_mom client running on the worker node that maui picks to run the job. Jobs are submitted to torque with the qsub command.

Test: Sleep Job

An easy job to submit and monitor is just a sleep command.

As one of your users, enter the command that will create a job that simply sleeps for 30 seconds, as shown below:

echo "sleep 30" | qsub

Immediately afterward, run the torque command qstat to see the job appear in torque's queue, and then the maui command showq. You can even run

pbsnodes | grep -v status | grep -v ntype

to see which node the job is running on. A script of my output is shown below.

kwanous@gyrfalcon:~$ echo "sleep 30" | qsub
6.gyrfalcon

kwanous@gyrfalcon:~$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
6.gyrfalcon               STDIN            kwanous                0 R batch          
kwanous@gyrfalcon:~$ showq
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME

6                   kwanous    Running     1     1:00:00  Wed Jan 23 14:00:24

     1 Active Job        1 of   28 Processors Active (3.57%)
                         1 of    7 Nodes Active      (14.29%)

... snipped ...

Total Jobs: 1   Active Jobs: 1   Idle Jobs: 0   Blocked Jobs: 0
kwanous@gyrfalcon:~$ pbsnodes | grep -v status | grep -v ntype
eagle
     state = free
     np = 4

 ... snipped ...

peregrine
     state = free
     np = 4
     jobs = 0/7.gyrfalcon

Approximately thirty seconds later, the job should finish running. If you run qstat and showq again, you should no longer see the job (6.gyrfalcon, in my example) running.

Sleep Job Results

In the home directory of the user you've submitted the job as, you should now see two files, something like:

  • STDIN.o3
  • STDIN.e3

where 3 is the job ID. The file ending in .o# is all of the output in the form of standard out that came from the job. .e# is all the output from standard error. For our sleep job, both of these should be empty. sleep doesn't give any output to standard out or standard error.

Test: Standard Output vs Standard Error

Qsub can also take input in the form of files. These files can give all sorts of specifications to torque about how long the job will run and what resources it needs. (To learn more about qsub submission files, see Torque Qsub Scripts.) We'll write just a simple one. Open your favorite text editor and enter the contents of my Standard Output/Error For Loop Script and save this file to submission. This script has a simple for loop that runs from 1 to 10. If the number is less than 5, it will print a statement to standard output. If the number is greater than or equal to 5, it will print a statement to standard error.

Submit the job with

qsub submission

where submission is the name of the script file.

Job Results

Again, you should have .o# and .e# files in your home directory, but this time they should start with the name of the file submitted to qsub (submission). This time, they should have content in them. Your output file should have the first four lines, which were printed to standard output:

1 is less than 5
2 is less than 5
3 is less than 5
4 is less than 5

and your error file should have the last six, which were printed to standard error:

5 is greater than or equal to 5
6 is greater than or equal to 5
7 is greater than or equal to 5
8 is greater than or equal to 5
9 is greater than or equal to 5
10 is greater than or equal to 5

Hmm...

If you didn't get the results described on this page, visiting the Troubleshooting Torque and Maui page might be of help.

Personal tools