Slurm QOS and partitions#
Introduction#
In this section we present two configuration characteristics of Slurm that we use to control the resource utilization on our clusters:
- Quality of Service (QOS) affects the scheduling priority, the preemption, and the resource limits of submitted jobs.
- Partitions act as job queues, imposing restrictions on submitted jobs, such as on job sizes or times.
Quick start#
When configuring a Slurm job, you should explicitly define the QOS and the partition where your job will be submitted. You can do it either:
- by adding e.g.
#SBATCH --qos=serial
to your Slurm script; or - by adding that to the
sbatch
command directly (e.g.sbatch -q serial slurm_script.sh
)
Some clusters define a default QOS or partition, which will be used if you do not explicitly specify the corresponding parameters. Nevertheless, we recommend to always define the QOS and the partition of your job, even when using the default values. This reduces ambiguity and helps during debugging.
You can find in the documentation more details about our allocation policies.
Slurm Quality of Service (QOS)#
View QOS information#
You can view information on the available QOS by running on a cluster frontend:
Kuma, the GPU cluster#
On Kuma, the newest GPU cluster, we have the following QOS structure:
QOS | Priority | Max Wall-time | Max resources per job |
---|---|---|---|
normal |
normal | 3-00:00:00 | 8 nodes |
long |
low | 7-00:00:00 | 8 nodes |
build |
high | 04:00:00 | 1 node, 0 gpu, 16 core |
debug |
high | 01:00:00 | 2 gpu |
The default QOS is normal
.
Jed, the CPU cluster#
On Jed, the CPU cluster, we have the following QOS structure:
QOS | Priority | Max Wall-time | Min resources per Job | Max resources per job | Max jobs per user |
---|---|---|---|---|---|
serial |
low | 7-00:00:00 | 1 core | 1 node | 10001 |
parallel |
high | 15-00:00:00 | 1 node + 1 core | 32 nodes | 10001 |
free |
lowest | 6:00:00 | 1 core | 1 node | 150 |
debug |
highest | 2:00:00 | 1 core | 18 cores | 1 |
Special Jed QOS#
On Jed, there are two more QOS. These are meant to provide access to nodes with more RAM than the standard nodes, which have 512 GB of RAM. These QOS are:
QOS | Node properties | Max Wall-time | Max Nodes per Job | Max jobs per user |
---|---|---|---|---|
bigmem |
1 TB of RAM | 15-00:00:00 | 21 | 1001 |
hugemem |
2 TB of RAM | 3-00:00:00 | 2 | 4 |
To access these nodes you also need to choose partitions with the same name. So,
for instance, to run a job on hugemem
you'd need to:
Izar, the academic GPU cluster#
On Izar, our academic GPU cluster we have yet another QOS structure, with a total of four options:
QOS | Priority | Max Wall-time | Max resources per job |
---|---|---|---|
normal |
normal | 3-00:00:00 | |
long |
low | 7-00:00:00 | |
debug |
high | 1:00:00 | 2 GPUs |
build |
high | 8:00:00 | 1 node, 20 cores, 90 GB of RAM, 0 GPU |
The default QOS is normal
so if the standard values are fine for you, you don't
need to add any QOS to your jobs.
Limits exceeded
If these limits are reached the jobs will be held with the reason
QOSResourceLimit
.
Helvetios, the academic CPU cluster#
On Helvetios, the Academic CPU cluster, we have the following QOS structure:
QOS | Priority | Max Wall-time | Max resources per job |
---|---|---|---|
serial |
normal | 3-00:00:00 | 1 node, 5000 core |
parallel |
high | 15-00:00:00 | 32 node, 3060 core |
free |
lowest | 06:00:00 | 1 node |
debug |
High | 02:00:00 | 8 core |
The default QOS is serial
.
debug
QOS#
For debug
you are limited to one job at a time. The debug
QOS is meant to be
as general as possible. While the number of cores is fairly small, there is no
limit on the number of nodes. You can use up to 18 nodes (with one core per
node). So, if you want to test whether your MPI code is compiled properly you
can do:
Slurm partitions#
Slurm partitions are job queues, each defining different constraints, for example, job size limit, job time limit, or users permitted to use the partition.
Requesting a partition#
Generally, when you use Slurm to run jobs, you should request the partition where your job will be executed. There are two equivalent ways to do that:
- add
#SBATCH --partition=<PARTITION>
to your Slurm script;- example:
#SBATCH --partition=bigmem
- example:
- pass the requested partition as a
-p
or--partition
argument to thesbatch
command.- example:
sbatch --partition=bigmem slurm_script.sh
- example:
Default partitions#
Some partitions are defined as the default partitions, for example, the
standard
partition on CPU clusters. When no --partition
is defined in the
sbatch
script, jobs are assigned to the default partitions.
Always request a partition
When you want your job to be executed on a default partition, you are still
encouraged to explicitly request it, for example by passing
#SBATCH --partition=standard
. This will make your configurations more
robust and make debugging easier.
View partition information#
You can view basic information on the available partitions by running:
To view detailed information on the available partitions, execute:
List of Slurm partitions on CPU clusters#
The table below illustrates the typical partitions that most SCITAS users are expected to use. We present the following columns:
- Partition: the name of the Slurm partition.
- Default: whether this partition is the default.
- Attached QOS: whether a Quality of Service (QOS) has been attached to the partition. If specified, the partition has the same limits as the QOS.
- Clusters: the clusters where the partition is available.
Partition | Attached QOS | Clusters |
---|---|---|
standard |
n/a | Jed, Helvetios |
bigmem |
bigmem |
Jed |
hugemem |
hugemem |
Jed |
The standard
partition is the default partition on all CPU clusters.
List of Slurm partitions on GPU clusters#
The tables below illustrate the partitions available on the GPU clusters.
Partitions on Izar, the academic GPU cluster#
The table below illustrates the partitions available on the Izar GPU cluster. We present the following columns:
- Partition: the name of the Slurm partition.
- Allowed QOS: the respective Quality of Services (QOS) that are allowed to run on the partition.
Partition | Allowed QOS |
---|---|
gpu |
all |
gpu-xl |
all |
test |
normal |
gpu
is the default partition on Izar. The gpu-xl
partition is a subset of gpu
.
Partitions on Kuma, the GPU cluster#
The table below illustrates the partitions available on the Kuma GPU cluster. We present the following columns:
- Partition: the name of the Slurm partition.
- Allowed QOS: the respective Quality of Services (QOS) that are allowed to run on the partition.
Partition | Allowed QOS |
---|---|
h100 |
all |
l40s |
all |
No default partition on Kuma
There is no default partition on Kuma. You have to choose one! The job
will fail with the following message otherwise:
sbatch: error: Batch job submission failed: No partition specified or
system default partition