SCITAS FAQ#

Connecting to the Clusters#

Why can't I connect to the clusters from home?#

You can but to do so requires passing via the EPFL VPN service. See http://network.epfl.ch/vpn for how to use this service.

Why can't I ssh into the nodes of the clusters?#

On our clusters, you can only access nodes on which you have currently running jobs. If no job of yours is ongoing, access to that node will be denied.

So, if your code writes important files to local file systems make sure to copy them to the one of the shared file systems (/home, /work, /scratch) before the jobs end, otherwise it may be hard to retrieve them afterwards.

SLURM Batch System Questions#

What is the maximum run time of a job?#

On CPU clusters (Helvetios and Jed), the maximum for the default QOS is 7 days. Jobs of more than 1 node can request up to 15 days of wall-time.

On GPU clusters (Izar and Kuma), the default QOS limits jobs to 3 days. For jobs of up to 7 days you can use the long QOS.

If your job requires more time, you can request an extension by contacting us. Please provide a clear explanation of why the additional time is needed and include details about your workflow to help us assess the request.

How do I submit a job that requires a run time of more than seven days?#

Labs with signed contracts may request a QOS for special needs. To do so, please send a request to 1234@epfl.ch with the subject line "HPC: request new QOS". Please provide an expanation of why the additional time is needed.

Can I submit array jobs and, if so, how?#

Yes, with the --array option to sbatch. See http://slurm.schedmd.com/job_array.html for the official documentation and our scitas-examples repository for several examples.

What is the difference between `hpc-lab` and `lab`?#

hpc-lab is the name of the group to manage user access to the cluster (in the groups.epfl.ch sense). lab is the name of the Slurm account, automatically populated with users from the hpc-lab group. You have to use the account name lab in your batch scripts.

Please note that the synchronization between hpc-lab and our clusters happens overnight. As such, a user that's just been added to the group won't be able to submit jobs until tomorrow.

Yes! We use cgroups to limit the amount of CPU and memory assigned to users. There is no way for users to adversely affect each other.

What is a `<job id>`?#

It's the unique numerical identifier of a job and is given when you submit the job:

[user@cluster jobs]$ sbatch my_job.job
Submitted batch job 1234567

It can also be seen using squeue:

[user@cluster jobs]$ squeue
JOBID   PARTITION NAME       USER ST TIME    NODES NODELIST(REASON)
1234567 serial    my_job.job user R  1:02    1     c03

How do I deploy my multi-node MPI job to nodes connected to the same switch?#

You can specify the maximum number of switches to be used as follows (in this case one switch):

#SBATCH --switches=1

Please note that jobs with such requirements may take longer to schedule than those than can be spread across the cluster. This option should only be used in very specific cases!

Is any form of simultaneous multithreading (SMT) enabled on the clusters?#

In general SMT can decrease performance if there are any shared resources in the CPU. Considering parallel codes typically all perform similar operations any such shared resources would quickly become a bottleneck. As such SMT/HT is as a general rule disabled in all SCITAS clusters.

Why does my job fail immediately without leaving any trace (output)?#

This usually happens when one specifies a non-existing working directory (for example by using: --chdir /path/that/does/not/exist).

Why does my job fail after submission with error "Invalid generic resource (gres) specification"?#

Because on Izar it's necessary to specify the --gres=gpu:X flag, where X is the number of GPUs per node you require.

How do I set up job notification emails?#

Add both following commands to your submission script to set the email address:

#SBATCH --mail-user=$FIRST_NAME.$LAST_NAME@epfl.ch
#SBATCH --mail-type=$NOTIFICATION_TYPE

A valid email address, preferably one provided by EPFL.
A type of notification. Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage out and teardown completed), TIME_LIMIT. Multiple type values may be specified in a comma-separated list.

Why does my job fail after requeuing with the error "Requested operation is presently disabled for job JOBID"?#

The requeueing possibility must be explicitly requested by the user by adding the option --requeue to the batch script:

#SBATCH --requeue

Later, a job executed with this option can use the scontrol requeue JobID command to be dispatched again.

I have many jobs waiting on the queue, but I need the last job I submitted to run first. What can I do to adapt the priority of my jobs relative to each other?#

As a regular you have two options, either to hold jobs or to change a parameter called niceness.

The easiest option is to put every other job on hold (e.g. scontrol hold 12345,12346,12347) so that only job you want to run can be scheduled. Note that you will later on have to release the jobs (e.g. scontrol release 12345,12346,12347) otherwise the jobs will be on hold indefinitely.

Alternatively you can alter the order of your own jobs by adapting their niceness. When you check the properties of your job the first few lines are something like:

$ scontrol show job 12345
JobId=12345 JobName=test
   UserId=user(1000) GroupId=unit(2000) MCS_label=N/A
   Priority=7151 Nice=0 Account=lab-account QOS=parallel
   JobState=PENDING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0

Please note the Priority and Nice parameters. Priority is a dynamical variable which Slurm uses to define the order of jobs on queues. Higher priority jobs, as expected, will run earlier all else being equal. You cannot change the priority of the job, since Slurm adjusts it at regular intervals.

You can, however, change the niceness of the job. The Nice value will be subtracted from the Priority so the higher you set Nice the lower the final Priority for the job. As a regular user you cannot set negative Nice values, so you cannot adapt your important job, you have to set higher Nice values for your other jobs.

On the example above, you see Priority=7151 and Nice=0, which is the default Nice value. If you wanted job 23456 to run first and the priority of that job is currently 3000 then you'd need to change the niceness of your higher priority jobs by at least 4152 (one more than the difference). We change all the other jobs at once:

scontrol update job 12345,12346,12347 nice=4200

If you now look at the queue:

$ squeue -u $USER
             JOBID  PARTITION  NAME     USER ST       TIME  NODES NODELIST(REASON)
             23456   standard  test     user PD       0:00      2 (Resources)
             12345   standard  test     user PD       0:00      2 (Priority)
             12346   standard  test     user PD       0:00      2 (Priority)
             12347   standard  test     user PD       0:00      2 (Priority)

Your most recent job is now on top of your queue. If you want you can later on change the nice value for your jobs once more. Otherwise older jobs will be for a while of lower priority relative to the younger ones.

Another user has a lot of jobs running or in the queue, is that abuse?#

Submitting many jobs is generally acceptable, as Slurm uses a fair-share system to ensure that a user can't take over the resources just by submitting many jobs. See Slurm Job Priorities for more details on this.

Why is my job still pending while other user job submissions run much sooner?#

This is quite usual and happens due to jobs being scheduled to run according to their computed priority. The priority depends on many factors, including your job's QOS, but also your account's fairshare. For more details on this, please see Slurm Job Priorities.

Where can I find the official `sbatch` documentation?#

You can check the official documentation for details.

File System Questions#

Where is my `/scratch` space?#

Your /scratch space is located at /scratch/<username>. Within a job, you can also access it using the $SCRATCH environment variable.

Can you recover an important file that was on my scratch area?#

NO. /scratch is not backed up so the file is gone forever. Please, note that we automatically delete files on scratch to prevent it from filling up. Do not use your scratch space as long term storage.

I have deleted a file on `/home` or `/work` - How can I recover it?#

If it was deleted in the last two weeks you can use the daily snapshots to get it back. These can be found at:

/home/.snapshots/<date>/<username>/
/work/.snapshots/<date>/<laboratory or group>/

e.g. /home/.snapshots/2015-11-11/bob/.

File System Backup

The /home file system is backed up onto tape. If the file is no longer available in the snapshots, we may be able to help. The /work file system is not backed up by default.

How to display quota and usage information for the /home and /work file systems?#

To get the user quota and file system usage for your group members you can do either fsu -q /home or fsu -q /work.

Why do I get "Disk quota exceeded"?#

You exceeded your quota on /home or /work. Even if you free space up, the quota is not instantly recomputed. It's usually pretty fast, but depends on the file system's general usage.

We force a recompute every week on Sunday, so at most you'd have to wait until next Monday. In case of a problem please contact us.

How can I edit a file in the clusters using an application on my computer?#

If you wish to manipulate a file on the remote file system using software installed on your workstation you can mount the remote file system by using sshfs. After installing it, you can type from a terminal:

mkdir $HOME/local_mount
sshfs <username>@<cluster>.hpc.epfl.ch:/scratch/<username> $HOME/local_mount

where <username> is your GASPAR account and <cluster> is the cluster file system you wish to mount.

Why can't I access the files I stored in the Izar, Jed, or Kuma clusters under `/work` or `/home` when using Helvetios?#

Helvetios is based on unsupported, obsolete hardware, and SCITAS can only provide best-effort support for its maintenance and is no longer connected to the central storage (/home, /scratch, and /work are now local to the cluster).
No backup will be provided for the data. Ensure you have your own backup strategy in place.

Billing questions#

Do I have to pay for the debug QOS?#

No. Debug time is free of charge.

My job failed due to an hardware issue. Do I have to pay for it?#

No. When Slurm detects an hardware failure (e.g. one node failed on a multi-node job) it cancels the job and assigns it a state NODE_FAIL. These jobs are not billed.

How do I determine my usage since a certain point in time?#

At SCITAS we have the sausage utility (Scitas Account USAGE) to report the consumed resources, both compute (CPU and GPU) and storage.

By default, sausage lists the results for the user running the command starting from the first of the current month. If you want to change the start date you can do so with the -s YYYY-MM-DD option. Likewise, the -e YYYY-MM-DD could be used for setting the end date.

We illustrate an example output below:

$ sausage -s 2024-06-01 -e 2024-12-31 compute
╭─────────────────────────────────────────────────────────────────────╮
│                        USERNAME : johndoe                           │
│                   Capping: 20’000,00 CHF / month                    │
│             Global usage from 2024-06-01 to 2024-12-31              │
│╭───────────┬─────────┬──────┬───────┬─────────┬─────────┬──────────╮│
││           │         │      │       │         │         │     Costs││
││Account    │Cluster  │# jobs│GPU [h]│  CPU [h]│eCO₂ [kg]│     [CHF]││
│├───────────┼─────────┼──────┼───────┼─────────┼─────────┼──────────┤│
││doe-account│jed      │  5357│    0.0│273,135.0│    385.1│   1,497.8││
││doe-account│kuma     │   834│  527.6│ 34,265.3│     20.4│     265.1││
││doe-account│izar     │    53│   11.0│      0.0│      0.5│       1.2││
││doe-account│helvetios│    37│    0.0│ 33,241.2│     36.6│     186.1││
│╰───────────┴─────────┴──────┴───────┴─────────┴─────────┴──────────╯│
╰─────────────────────────────────────────────────────────────────────╯

It is also possible to ask for the usage of the entire set of users within the same account. For more information run sausage -h on any of our clusters.

I have access to SCITAS through multiple labs. How can I make one the default Slurm account?#

The best way to do this is to make use of the SBATCH_ACCOUNT variable. Add the following line to your ~/.bashrc file (or the equivalent for your default shell):

export SBATCH_ACCOUNT=<main_lab_account>

where <main_lab_account> is the account you want to set as the default one. Slurm will appropriately set the default account during job submission.

Option precedence

Please note that environment variables override anything set in the script with an #SBATCH option. You will need to do something like sbatch --account=<my_other_lab> my_script.sh for the second lab to be billed.

Can I put a cap on my costs? Can I limit the resources used at SCITAS?#

Yes. SCITAS offers a capping solution so that labs can limit their costs. The cap is implemented on a calendar month basis and is defined in CHF.

The cap can be implemented for the entire lab (an account from Slurm's perspective), or for individual users within the lab. Any user can ask for a cap on their user account. An admin of the lab account (group) can request for a cap for the lab account. In this case the cap is shared among all users who are members of the lab account.

We are working on a solution that allows lab heads to change the values themselves. In the meantime, you can send us an email asking for a specific limit.

Software Questions#

I want to use an Intel software on my own machine/server. How can I do it?#

Yes, Intel now provides its OneAPI suite for free. You have access to the compilers, MPI library and different tools.

Why do I get the error "module: command not found" or "slmodules: command not found"?#

This is probably because you have Tcsh as your login shell and the environment isn't propagated to the compute nodes.

In order to fix the issue please change the first line of your job script as follows:

#!/bin/bash -l

or

#!/bin/tcsh -l

The -l option tells Bash/Tcsh to launch an interactive shell which correctly sources the files in /etc/profile.d.

When trying to load certain modules you may get a message along the lines of:

$ module load comsol
Lmod has detected the following error:  Unable to load module because of error when evaluating modulefile:
     /ssoft/spack/jed_stable/share/spack/lmod/jed/linux-rhel9-x86_64/Core/comsol/6.2.lua: Empty or non-existent file
     Please check the modulefile and especially if there is a line number specified in the above message
While processing the following module(s):
    Module fullname  Module Filename
    ---------------  ---------------
    comsol/6.2       /ssoft/spack/jed_stable/share/spack/lmod/jed/linux-rhel9-x86_64/Core/comsol/6.2.lua

Access to some of our modules is limited, typically due to licensing restrictions. If you see a message like this it means you don't have access to the code. For some codes you can ask for access directly from EPFL in this page. If you see the code there, follow the procedure described to have access to the code. Keep in mind that accepting the conditions is just the first step of the process. You will likely be informed once your access has been approved. You won't be able to load the module until then.

Note that if you signed a license agreement for an earlier version, you may need assistance from the team managing licenses to use a newer version. In this case you may see the above error message even if you have signed the agreement a long time ago. Contact Service Desk for the necessary intervention as the access to the licensed software is not managed by SCITAS.

If you don't see the code on that page, the license is managed at SCITAS. Request access to it through 1234@epfl.ch with the subject "HPC: request access to software".

Group updates

Access to our modules is managed via groups. Groups are not updated on ongoing sessions. You will need to open a new session after having been granted access to a program, in order to load the relevant module.

How can I change my default shell?#

Most systems use Bash by default and most of our documentation assumes your default shell is Bash. You can change your default shell on this page.

Which options should I use to link with the Intel MKL?#

Ask the Intel Math Kernel link line advisor

If you use the Intel compilers then you can pass the -mkl flag which will do the hard work for you.

What compilers/MPI combination do you support?#

SCITAS supports Intel compilers and Intel MPI or GCC compilers and OpenMPI. Other combinations are not supported (if provided they will be supported on a best-effort basis).

Why does my COMSOL job fail to get a license?#

Occasionally your COMSOL jobs might fail with a message such as:

Could not obtain license for COMSOL ...

License error: -5.
 No such product exists.
 No such feature exists.
 Feature:       COMSOL
 License path:  ...
 FlexNet Licensing error:-5,414

There are particularly few license tokens for some COMSOL features (5 or 10 tokens are common) and problems are expected. We go into more detail on this issue and possible workarounds in this page.

(Alternatively, if possible for the task you are doing, you can try to use other equivalent software packages like ANSYS.)

GPUs#

Which cluster can I use to run jobs using GPUs?#

At the moment, we have two GPU accelerated cluster:

Kuma
Izar (Only for educational use)

How do I submit jobs to the GPU nodes?#

This depends on which cluster you're targeting (see previous question).

If you are using Kuma, you have to choose one of two partitions with different hardware capabilities (h100 for GPUs with FP64 capabilities, l40s for GPUs with FP32 capabilities). Whether you use one or the other will depend on the needs of your code. If you need the lower precision you would need to add to your script:

--partition=l40s --gres=gpu:X

Where X is the number of GPUs required per node.

On Izar, the cluster used for courses and student projects all the GPUs are the same, so you don't need the partition as above. You would need to add to your script:

--gres=gpu:X

QOS / Partition#

Kuma#

You will find the information about our QOS and partitions configuration here

Jed#