Slurm QOS and partitions#

Introduction#

With the arrival of the Jed cluster in early 2023, we adapted the partition and QOS structures in our clusters. Since then, we have also successfully implemented these changes in the Helvetios cluster.

In most cases you should define the QOS you want and the partition is defined to be the same. This should be done in one of two ways:

by adding e.g. #SBATCH --qos=serial to your Slurm script;
by adding that to the sbatch command directly (e.g. sbatch -q serial slurm_script.sh)

Standard QOS#

Here's a quick look at the QOS that will be of interest for most SCITAS users:

QOS	Usage
`serial`	for jobs from 1 core, up to 1 node
`parallel`	for jobs of more than 1 full node
`free`	low priority, for users without paid access to our clusters
`debug`	high priority, for testing codes or inputs

There are no limits on the number of jobs the user can submit in serial, parallel, or free.

For debug you are limited to one job at a time. The debug QOS is meant to be as general as possible. While the number of cores is fairly small, there is no limit on the number of nodes. You can use up to 18 nodes (with one core per node). So, if you want to test whether your MPI code is compiled properly you can do:

$ sbatch -q debug --nodes=2 --ntasks-per-node=9 --time=10:00 my_script.sh

A more detailed description can be found on this table:

QOS	Priority	Max Wall-time	Min resources per Job	Max resources per job	Max resources per user	Max jobs per user
`serial`	low	3-00:00:00	1 core	1 node	7560 cores	10001
`parallel`	high	15-00:00:00	1 node + 1 core	32 nodes	12672 cores	10001
`free`	lowest	6:00:00	1 core	1 node	1368 cores	150
`debug`	highest	2:00:00	1 core	18 cores		1

Special QOS#

On Jed, there are two more QOS. These are meant to provide access to nodes with more RAM than the standard nodes, which have 512 GB of RAM. These QOS are:

QOS	Node properties	Max Wall-time	Max Nodes per Job	Max resources per user	Max jobs per user
`bigmem`	1 TB of RAM	15-00:00:00	21	1512 cores	1001
`hugemem`	2 TB of RAM	3-00:00:00	2	144 cores	4

To access these nodes you also need to choose partitions with the same name. So, for instance, to run a job on hugemem you'd need to:

$ sbatch -p hugemem -q hugemem ...

Izar, the GPU cluster#

On Izar, our GPU cluster we still have a different QOS structure, with a total of three options:

QOS	Priority	Max Wall-time	Max resources per job
`gpu`	normal	3-00:00:00	1 node
`week`	low	7-00:00:00	2 nodes
`gpu_free`	lowest	12:00:00	1 node
`debug`	lowest	1:00:00	1 node,1gpu,20core
`build`	lowest	8:00:00	1 node

The default QOS is gpu so if the standard values are fine for you, you don't need to add any QOS to your jobs.

For gpu_free there are two significant limits:

any one user is limited to no more than 3 nodes;
all the jobs can use a maximum of 5 nodes.

Limits exceeded

If these limits are reached the jobs will be held with the reason QOSResourceLimit.

Last update: January 29, 2024