Slurm QOS and partitions#
Introduction#
With the arrival of the Jed cluster in early 2023, we adapted the partition and QOS structures in our clusters. Since then, we have also successfully implemented these changes in the Helvetios cluster.
In most cases you should define the QOS you want and the partition is defined to be the same. This should be done in one of two ways:
- by adding e.g.
#SBATCH --qos=serial
to your Slurm script; - by adding that to the sbatch command directly
(e.g.
sbatch -q serial slurm_script.sh
)
You can find more details about our allocation policies on this page.
Kuma, the GPU cluster#
On Kuma, the newest GPU cluster, we have the following QOS structure:
QOS | Priority | Max Wall-time | Max resources per job |
---|---|---|---|
normal |
normal | 3-00:00:00 | 8 nodes |
long |
low | 7-00:00:00 | 8 nodes |
build |
High | 04:00:00 | 1 node, 0 gpu, 16 core |
debug |
High | 01:00:00 | 2 gpu |
The default QOS is normal
.
Jed, the CPU cluster#
On Jed, the CPU cluster, we have the following QOS structure:
QOS | Priority | Max Wall-time | Min resources per Job | Max resources per job | Max jobs per user |
---|---|---|---|---|---|
serial |
low | 3-00:00:00 | 1 core | 1 node | 10001 |
parallel |
high | 15-00:00:00 | 1 node + 1 core | 32 nodes | 10001 |
free |
lowest | 6:00:00 | 1 core | 1 node | 150 |
debug |
highest | 2:00:00 | 1 core | 18 cores | 1 |
Special Jed QOS#
On Jed, there are two more QOS. These are meant to provide access to nodes with more RAM than the standard nodes, which have 512 GB of RAM. These QOS are:
QOS | Node properties | Max Wall-time | Max Nodes per Job | Max jobs per user |
---|---|---|---|---|
bigmem |
1 TB of RAM | 15-00:00:00 | 21 | 1001 |
hugemem |
2 TB of RAM | 3-00:00:00 | 2 | 4 |
To access these nodes you also need to choose partitions with the same name. So,
for instance, to run a job on hugemem
you'd need to:
Izar, the Academic GPU Cluster#
On Izar, our academic GPU cluster we have yet another QOS structure, with a total of four options:
QOS | Priority | Max Wall-time | Max resources per job |
---|---|---|---|
gpu |
normal | 3-00:00:00 | 1 node |
week |
low | 7-00:00:00 | 1 gpu |
gpu_free |
lowest | 12:00:00 | 1 node |
debug |
lowest | 1:00:00 | 1 node, 1 gpu, 20 core |
build |
lowest | 8:00:00 | 1 node |
The default QOS is gpu
so if the standard values are fine for you, you don't
need to add any QOS to your jobs.
For gpu_free
there are two significant limits:
- any one user is limited to no more than 3 nodes;
- all the jobs can use a maximum of 5 nodes.
Limits exceeded
If these limits are reached the jobs will be held with the reason
QOSResourceLimit
.
Helvetios, the Academic CPU Cluster#
On Helvetios, the Academic CPU cluster, we have the following QOS structure:
QOS | Priority | Max Wall-time | Max resources per job |
---|---|---|---|
serial |
normal | 3-00:00:00 | 1 node, 5000 core |
parallel |
high | 15-00:00:00 | 32 node, 3060 core |
free |
lowest | 06:00:00 | 1 node |
debug |
High | 02:00:00 | 8 core |
The default QOS is serial
.
debug
QOS#
For debug
you are limited to one job at a time. The debug
QOS is meant to be
as general as possible. While the number of cores is fairly small, there is no
limit on the number of nodes. You can use up to 18 nodes (with one core per
node). So, if you want to test whether your MPI code is compiled properly you
can do:
Slurm Partitions#
Slurm partitions are job queues, each defining different constraints, for example, job size limit, job time limit, or users permitted to use the partition.
Requesting a Partition#
Generally, when you use Slurm to run jobs, you should request the partition where your job will be executed. There are two equivalent ways to do that:
- add
#SBATCH --partition=<PARTITION>
to your Slurm script; - example:
#SBATCH --partition=bigmem
- pass the requested partition as a
-p
or--partition
argument to thesbatch
command. - example:
sbatch --partition=bigmem slurm_script.sh
Default partitions#
Some partitions are defined as the default partitions, for example, the
standard
partition on CPU clusters. When no --partition
is defined in the
sbatch
script, jobs are assigned to the default partitions.
Always request a partition
When you want your job to be executed on a default partition, you are still
encouraged to explicitly request it, for example by passing
#SBATCH --partition=standard
. This will make your configurations more
robust and make debugging easier.
View Partition information#
You can view basic information on the available partitions by running:
To view detailed information on the available partitions, execute:
List of Slurm Partitions on CPU clusters#
The table below illustrates the typical partitions that most SCITAS users are expected to use. We present the following columns:
- Partition: the name of the Slurm partition.
- Default: whether this partition is the default.
- QOS: whether a Quality of Service (QOS) has been attached to the partition. If specified, the partition has the same limits as the QOS.
Partition | Default | QOS | Clusters |
---|---|---|---|
standard |
✔ | n/a | Jed, Helvetios |
bigmem |
bigmem |
Jed | |
hugemem |
hugemem |
Jed |
List of Slurm Partitions on GPU clusters#
The table below illustrates the partitions available on GPU clusters. We present the following columns:
- Partition: the name of the Slurm partition.
- Default: whether this partition is the default.
- Allowed QOS: the respective Quality of Services (QOS) that are allowed to run on the partition.
Partition | Default | Allowed QOS | Clusters |
---|---|---|---|
gpu |
✔ | All except build , debug |
Izar |
build |
build |
Izar | |
debug |
debug |
Izar | |
gpu-xl |
gpu , gpu_free |
Izar | |
test |
gpu |
Izar | |
h100 |
kuma |
Kuma | |
l40s |
kuma |
Kuma |
No default partition on Kuma
There is no default partition on Kuma. You have to choose one! The job
will fail with the following message otherwise:
sbatch: error: Batch job submission failed: No partition specified or system
default partition