Torque Queue Configuration
From Debian Clusters
Contents |
About Torque
If you haven't done the initial set up of Torque and Maui yet, then the Using a Scheduler and Queue page may be of interest.
A Word about Queues
Queues are what allow optimized scheduling on a cluster. The scheduler, Maui will receive the information about the queues from Torque, and it uses the information about the queues in order to make decisions about what jobs will run when (and on which nodes). Without queues, the cluster would just be running batch jobs in a first come, first served basis.
For instance, suppose the cluster is currently full and three jobs are submitted. One needs to run for two hours on two nodes, one needs to run for four hours on two nodes, and one needs to run for six hours on one node. The scheduler could see that one node will be coming available, and schedule the six hour job immediately. On the other hand, two nodes may be coming available shortly, and then the priority of the queues would determine whether the two node or four node job should run.
In their qsub scripts, users specify what resources they need for a job. They can also specify to qsub (on the command line) what queue their job should be put into.
Copying from an Existing Setup
If you're lucky enough to have an existing setup with queues already configured, you can copy this information over to another server. Or, if you'd like, you can use the Torque Queues Example from a different cluster at my institution. There may be a better way to do this - and if you know of one, please e-mail me at kwanous <at> debianclusters <dot> org and let me know - but this is a route that worked for me.
All of the queue configuration files are stored at $PBSHOME/server_priv/queues ($PBSHOME is /var/spool/pbs if you followed my Torque tutorial). On the head node of the cluster with the queues configured, cd into this directory. For each of the queues on that cluster, you need the setup information. To grab all of it in one fell swoop, run
for x in `ls`; do qmgr -c "print queue $x"; done >> /tmp/queues
-
/tmp/queuesis where the output will be saved at. You can change this to something else, if you want.
Then, you can either copy the outputted file by hand (cat it and then copy-paste to a new file) or rsync it over to your new cluster. Once the file is on the new cluster, it's pretty easy to plunk it in, because the file is already formatted for input to qmgr. Just run
cat /tmp/queue | qmgr
If the queues have been created, the corresponding files will be generated in $PBSHOME/server_priv/queues. You may need to restart the pbs_server in order for the changes to take place in the live queues. This is done with
killall -KILL pbs_serverpbs_server
Configuring Queues by Hand
The same kind of format can be used to generate queues by hand. Generally this is done by creating a file with all the input and then piping this to qmgr as with the example above. There are quite a few different options for queue configuration, including access control lists, maximum numbers of jobs the queue will take, whether the queue is active and should run jobs, and more in addition to indicating the available resources and walltime to use for this queue.
Let's examine one of the ones automatically generated in the Torque Queues Example.
create queue long set queue long queue_type = Execution set queue long Priority = 60 set queue long max_running = 128 set queue long resources_max.cput = 10:00:00 set queue long resources_min.cput = 02:00:01 set queue long resources_default.cput = 03:00:00 set queue long resources_default.walltime = 04:00:00 set queue long max_user_run = 8 set queue long enabled = True set queue long started = True
- The first part of defining a queue is the
create queuedirective. Here, the name of the queue is "long". After this line, we need to specifically tell qmgr which queue we're configuring, so "long" will be repeated for the rest of the setup lines. queue_typespecifies one of two types -routeorexecution. Route queues are responsible for putting jobs into other queues based on its attributes. Execution queues are ones that jobs will actually run in.prioritycan be used to assign different preferences to queues. Zero is the default value, and ???!?!?!?is higher!?!??!?!?!max_runningis the highest possible number of jobs in this queue at any given timeresources_max,resources_min, andresources_defaultare used to designate the maximum, minimum, and default resources. There are quite a few different values that can be specified here --
cput- CPU time -
nodes- the number of nodes -
ncpus- the number of CPUs -
walltime- how is walltime different from CPU time?
-
-
max_user_runis the highest number number of jobs that a user is allowed to have running in the queue at any given time. -
enabledallows the queue to accept job submissions. This is false by default. -
startedallows the queue to run job submissions. This is false by default.
Once you have a file listing all the queue creation details, then you can pipe this file into qmgr to create the queue, like this:
cat /tmp/newqueues | qmgr
Alternatively, you can bring qmgr up in interactive mode with just qmgr and type in the lines one at a time.
Either way, you may need to restart the pbs_server in order for the changes to take place in the live queues. This is done with
killall -KILL pbs_serverpbs_server

