Slurm QOS and partitions#
Introduction#
With the arrival of the Jed cluster in early 2023, we adapted the partition and QOS structures in our clusters. Since then, we have also successfully implemented these changes in the Helvetios cluster.
In most cases you should define the QOS you want and the partition is defined to be the same. This should be done in one of two ways:
- by adding e.g.
#SBATCH --qos=serial
to your Slurm script; - by adding that to the sbatch command directly (e.g.
sbatch -q serial slurm_script.sh
)
Standard QOS#
Here's a quick look at the QOS that will be of interest for most SCITAS users:
QOS | Usage |
---|---|
serial |
for jobs from 1 core, up to 1 node |
parallel |
for jobs of more than 1 full node |
free |
low priority, for users without paid access to our clusters |
debug |
high priority, for testing codes or inputs |
There are no limits on the number of jobs the user can submit in serial
, parallel
, or free
.
For debug
you are limited to one job at a time. The debug
QOS is meant to be as general as
possible. While the number of cores is fairly small, there is no limit on the number of nodes.
You can use up to 18 nodes (with one core per node). So, if you want to test whether your MPI
code is compiled properly you can do:
A more detailed description can be found on this table:
QOS | Priority | Max Wall-time | Min resources per Job | Max resources per job | Max resources per user | Max jobs per user |
---|---|---|---|---|---|---|
serial |
low | 3-00:00:00 | 1 core | 1 node | 7560 cores | 10001 |
parallel |
high | 15-00:00:00 | 1 node + 1 core | 32 nodes | 12672 cores | 10001 |
free |
lowest | 6:00:00 | 1 core | 1 node | 1368 cores | 150 |
debug |
highest | 2:00:00 | 1 core | 18 cores | 1 |
Special QOS#
On Jed, there are two more QOS. These are meant to provide access to nodes with more RAM than the standard nodes, which have 512 GB of RAM. These QOS are:
QOS | Node properties | Max Wall-time | Max Nodes per Job | Max resources per user | Max jobs per user |
---|---|---|---|---|---|
bigmem |
1 TB of RAM | 15-00:00:00 | 21 | 1512 cores | 1001 |
hugemem |
2 TB of RAM | 3-00:00:00 | 2 | 144 cores | 4 |
To access these nodes you also need to choose partitions with the same name. So, for instance, to run
a job on hugemem
you'd need to:
Izar, the GPU cluster#
On Izar, our GPU cluster we still have a different QOS structure, with a total of three options:
QOS | Priority | Max Wall-time | Max resources per job |
---|---|---|---|
gpu |
normal | 3-00:00:00 | 1 node |
week |
low | 7-00:00:00 | 2 nodes |
gpu_free |
lowest | 12:00:00 | 1 node |
debug |
lowest | 1:00:00 | 1 node,1gpu,20core |
build |
lowest | 8:00:00 | 1 node |
The default QOS is gpu
so if the standard values are fine for you, you don't need to add any QOS
to your jobs.
For gpu_free
there are two significant limits:
- any one user is limited to no more than 3 nodes;
- all the jobs can use a maximum of 5 nodes.
Limits exceeded
If these limits are reached the jobs will be held with the reason QOSResourceLimit
.