Skip to content

Slurm QOS and partitions#

Introduction#

With the arrival of the Jed cluster in early 2023, we adapted the partition and QOS structures in our clusters. Since then, we have also successfully implemented these changes in the Helvetios cluster.

In most cases you should define the QOS you want and the partition is defined to be the same. This should be done in one of two ways:

  • by adding e.g. #SBATCH --qos=serial to your Slurm script;
  • by adding that to the sbatch command directly (e.g. sbatch -q serial slurm_script.sh)

You can find more details about our allocation policies on this page.

Kuma, the GPU cluster#

On Kuma, the newest GPU cluster, we have the following QOS structure:

QOS Priority Max Wall-time Max resources per job
normal normal 3-00:00:00 8 nodes
long low 7-00:00:00 8 nodes
build High 04:00:00 1 node, 0 gpu, 16 core
debug High 01:00:00 2 gpu

The default QOS is normal.

Jed, the CPU cluster#

On Jed, the CPU cluster, we have the following QOS structure:

QOS Priority Max Wall-time Min resources per Job Max resources per job Max jobs per user
serial low 3-00:00:00 1 core 1 node 10001
parallel high 15-00:00:00 1 node + 1 core 32 nodes 10001
free lowest 6:00:00 1 core 1 node 150
debug highest 2:00:00 1 core 18 cores 1

Special Jed QOS#

On Jed, there are two more QOS. These are meant to provide access to nodes with more RAM than the standard nodes, which have 512 GB of RAM. These QOS are:

QOS Node properties Max Wall-time Max Nodes per Job Max jobs per user
bigmem 1 TB of RAM 15-00:00:00 21 1001
hugemem 2 TB of RAM 3-00:00:00 2 4

To access these nodes you also need to choose partitions with the same name. So, for instance, to run a job on hugemem you'd need to:

sbatch -p hugemem -q hugemem ...

Izar, the Academic GPU Cluster#

On Izar, our academic GPU cluster we have yet another QOS structure, with a total of four options:

QOS Priority Max Wall-time Max resources per job
gpu normal 3-00:00:00 1 node
week low 7-00:00:00 1 gpu
gpu_free lowest 12:00:00 1 node
debug lowest 1:00:00 1 node, 1 gpu, 20 core
build lowest 8:00:00 1 node

The default QOS is gpu so if the standard values are fine for you, you don't need to add any QOS to your jobs.

For gpu_free there are two significant limits:

  • any one user is limited to no more than 3 nodes;
  • all the jobs can use a maximum of 5 nodes.

Limits exceeded

If these limits are reached the jobs will be held with the reason QOSResourceLimit.

Helvetios, the Academic CPU Cluster#

On Helvetios, the Academic CPU cluster, we have the following QOS structure:

QOS Priority Max Wall-time Max resources per job
serial normal 3-00:00:00 1 node, 5000 core
parallel high 15-00:00:00 32 node, 3060 core
free lowest 06:00:00 1 node
debug High 02:00:00 8 core

The default QOS is serial.

debug QOS#

For debug you are limited to one job at a time. The debug QOS is meant to be as general as possible. While the number of cores is fairly small, there is no limit on the number of nodes. You can use up to 18 nodes (with one core per node). So, if you want to test whether your MPI code is compiled properly you can do:

sbatch -q debug --nodes=2 --ntasks-per-node=9 --time=10:00 my_script.sh

Slurm Partitions#

Slurm partitions are job queues, each defining different constraints, for example, job size limit, job time limit, or users permitted to use the partition.

Requesting a Partition#

Generally, when you use Slurm to run jobs, you should request the partition where your job will be executed. There are two equivalent ways to do that:

  • add #SBATCH --partition=<PARTITION> to your Slurm script;
  • example: #SBATCH --partition=bigmem
  • pass the requested partition as a -p or --partition argument to the sbatch command.
  • example: sbatch --partition=bigmem slurm_script.sh

Default partitions#

Some partitions are defined as the default partitions, for example, the standard partition on CPU clusters. When no --partition is defined in the sbatch script, jobs are assigned to the default partitions.

Always request a partition

When you want your job to be executed on a default partition, you are still encouraged to explicitly request it, for example by passing #SBATCH --partition=standard. This will make your configurations more robust and make debugging easier.

View Partition information#

You can view basic information on the available partitions by running:

sinfo

To view detailed information on the available partitions, execute:

scontrol show partitions

List of Slurm Partitions on CPU clusters#

The table below illustrates the typical partitions that most SCITAS users are expected to use. We present the following columns:

  • Partition: the name of the Slurm partition.
  • Default: whether this partition is the default.
  • QOS: whether a Quality of Service (QOS) has been attached to the partition. If specified, the partition has the same limits as the QOS.
Partition Default QOS Clusters
standard n/a Jed, Helvetios
bigmem bigmem Jed
hugemem hugemem Jed

List of Slurm Partitions on GPU clusters#

The table below illustrates the partitions available on GPU clusters. We present the following columns:

  • Partition: the name of the Slurm partition.
  • Default: whether this partition is the default.
  • Allowed QOS: the respective Quality of Services (QOS) that are allowed to run on the partition.
Partition Default Allowed QOS Clusters
gpu All except build, debug Izar
build build Izar
debug debug Izar
gpu-xl gpu, gpu_free Izar
test gpu Izar
h100 kuma Kuma
l40s kuma Kuma

No default partition on Kuma

There is no default partition on Kuma. You have to choose one! The job will fail with the following message otherwise:
sbatch: error: Batch job submission failed: No partition specified or system default partition