Skip to content

Pay-per-use group account#

SCITAS pay-per-use group accounts are based on a pay-as-you-go pricing model, in other words, users pay based on how much they consume. This type of account is a group account linked to a lab or a project. Monthly invoices are sent to the head or the responsible of the group. Note that the fees are eligible for FNS and ERC as service costs and the invoices can be sent to FNS and ERC for reimbursement.

To create a pay-per-use account please ask the head of the laboratory/project to fill in the pay-per-use account creation form.

If you already have a pay-per-use account and would like to delete it, please submit a request.

Constraints and limitations#

Pay-per-use group accounts are imposed to a number of limitations. These extend to all users who submit jobs under the same account.

Resource limits#

The following limits are enforced to prevent excessive resource usage:

  • Maximum wall clock time, for example, 3 days.
  • Maximum number of nodes per job, for example, 8 nodes per job.

The exact limits depend on the Quality of Service (QOS) and the cluster where you submit your jobs.

The head of the laboratory/project can submit a request to raise these limits temporarily.

Budget limits#

To protect laboratories and projects against unwanted or accidental charges, we impose the following budget limits on pay-per-use group accounts:

  • CHF 1'000.- per account and per month

When all members of an account have collectively consumed compute resources equivalent to the aforementioned threshold, within a calendar month, it will no longer be possible for the given account to consume any more resources until the end of the month.

The head of the laboratory/project can submit a request to:

  • modify the budget limit of the entire account; and/or
  • impose or modify budget limits on individual users.

Individual budget limits

No budget limits are imposed by default on individual users, unless explicitly requested.

Managing the SCITAS pay-per-use group account#

This section describes how to manage your SCITAS pay-per-use group account and more precisely how to add and remove users. Throughout this page, we will use a dummy account called hpc-example. Of course, you will have to adapt the group name with the one of your lab.

Administrator authorizations

Everything done in this section of the documentation requires you to be an administrator of the group. To verify if it is indeed the case, go to the group page:

https://groups.epfl.ch/#/home/[group_name]

where [group_name] is the name of your group, and check that your name appears in the "Administrators/Administrateurs" section.

If you are not an administrator, but would like to be added or removed from an account, please contact directly one of the administrators of the group.

Adding users to the SCITAS pay-per-use group account#

To add one or multiple users, you need to go to your account main page:

https://groups.epfl.ch/#/home/[group_name]

where [group_name] is the name of your group, e.g. hpc-example. This is the dashboard allowing to manage your account.

On the group page, select the section "Members":

On this new page, you will be able to add a single person, a service, a group, a organizational unit, or multiple people at once, by clicking the "Add Members" button:

Adding a group

Please note that if you add a group my-group, all members of that group will be able to use the pay-per-use group account.

Usually, only the "Add > Person" and "Add > Multiple" will be used. In those two fields, you can input either the first and/or last name, or the SCIPER number of the person(s) you wish to add. Note that it must uniquely identify the user.

Once you have entered the identifying information in the field, hit the "Enter" key. One or more associated people will appear with the corresponding check boxes. Check the box next to the name(s) of the person(s) you want to add. A button "Add" should appear. Please note that multiple names can appear if the information provided is not unique.

Removing users from the SCITAS pay-per-use group account#

User removal

Before removing a user from your account, please make sure all of their data is correctly backed up. The user's /home directory will be removed if there are no other accounts associated with them. Note however that the /home directory of every user is backed up and a copy is kept for six months. Please see File systems for more information.

To remove users select the users you want to remove and a "Remove selected members" will appear:

Billing of the pay-per-use group account#

Basic information#

When you are using our clusters with a pay-per-use group account account all your jobs with some exceptions will be submitted to billing for the account concerned.

Exceptions:

Price list#

Billing is based on the SCITAS operational costs. The current unit costs and the prices for using all of the SCITAS services are described in detail in the pricing documentation.

Billing on GPU clusters#

On GPU clusters the cpu.hour unit is not billed.

For example, reserving 1 GPU and 20 cores on kuma would be billed the same as reserving 1 GPU and 1 core, according to the gpu.hour unit.

Responsible resource reservation

To provide a high resource availability to our users, please reserve only the number of cores (cpu) and/or gpus that you actually need for your compute task.

Checking your resource consumption and costs#

You can use sausage to display the amount of resources you have already consumed and the current costs.

The sausage tool also shows an estimate of CO2 equivalent emissions on the clusters. You can read more about this in the section describing the emissions calculation.

# sausage -h
usage: Sausage [-h] [-v] [-V] [-u USERNAME] [-a] [-A ACCOUNT] [-s START]
               [-e END] [-x] [-t]

SCITAS Account Usage.

options:
  -h, --help            show this help message and exit
  -v, --verbose         Verbose (default: False)
  -V, --version         Version
  -u USERNAME, --username USERNAME
                        If not provided whoami is considered (default: None)
  -a, --all             all users from an account are printed (default: False)
  -A ACCOUNT, --account ACCOUNT
                        Prints account consumption per cluster (default: None)
  -s START, --start START
                        Start date - format YYYY-MM-DD (default: 2023-06-01)
  -e END, --end END     End date (included) - format YYYY-MM-DD (default:
                        2023-06-30)
  -x, --csv             Print result in csv style (default: False)
  -t, --total           Print result with total (default: False)

By default, sausage displays the information for your user for the current month.

jed # date; sausage
Tue Jun 20 09:29:13 CEST 2023
╭────────────────────────────────────────────────────────────────────╮
                        USERNAME : ncoudene                                      Global usage from 2023-06-01 to 2023-06-30             │╭──────────┬──────────┬──────┬───────┬───────┬─────────┬───────────╮│
││Account   │Cluster   │# jobs│GPU [h]│CPU [h]│eCO₂ [kg]│Costs [CHF]││
│├──────────┼──────────┼──────┼───────┼───────┼─────────┼───────────┤│
││scitas-ge │jed            4    0.0│    0.0│      0.0│        0.0││
││scitas-ge │helvetios      1    0.0│    0.0│      0.0│        0.0││
│╰──────────┴──────────┴──────┴───────┴───────┴─────────┴───────────╯│
╰────────────────────────────────────────────────────────────────────╯
sausage v0.12.1.2

You can browse your past consumption with some arguments:

jed # date; sausage --start 2023-01-01 --total
Tue Jun 20 09:31:49 CEST 2023
╭────────────────────────────────────────────────────────────────────╮
                        USERNAME : ncoudene                                      Global usage from 2023-01-01 to 2023-06-30             │╭──────────┬──────────┬──────┬───────┬───────┬─────────┬───────────╮│
││Account   │Cluster   │# jobs│GPU [h]│CPU [h]│eCO₂ [kg]│Costs [CHF]││
│├──────────┼──────────┼──────┼───────┼───────┼─────────┼───────────┤│
││scitas-ge │helvetios    182    0.0│    1.1│      0.0│        0.0││
││scitas-ge │jed          123    0.0│   82.0│      0.1│        0.5││
││scitas-ge │izar           6    0.0│    0.0│      0.0│        0.0││
│╰──────────┴──────────┴──────┴───────┴───────┴─────────┴───────────╯│
 Walltime GPU             h    0.0                                   Walltime CPU             h   83.2                                   Number of jobs              311.0                                   Est. carbon footprint   kg    0.1                                   Costs                  CHF    0.5                                  ╰────────────────────────────────────────────────────────────────────╯
sausage v0.12.1.2

Running Jobs with capping enabled#

By default, all jobs are submitted with capping. This means that when you submit your jobs Slurm will do a calculation to estimate if you will exceed a certain limit. If so, your job will not be submitted.

The calculation is done in five parts. It will:

  • Estimate the cost of the soon-to-be submitted job: job_estimation
  • Verify if your username and account has a capping limit, if so we will take note of both username_capping and account_capping
  • Calculate what your user username_consumed and your account account_consumed has consumed in our clusters. This information comes from sausage.
  • Calculate what you are expected to use based on the jobs your username username_queued and your account account_queued have queued in all our clusters.
  • Verify if you have exceed a limit: username_capping - username_consumed - username_queued - job_estimation <= 0 or account_capping - account_consumed - account_queued - job_estimation <= 0

Default behaviour

By default, if you have exceeded a limit, your job will NOT be submitted.

Support

If you need to change your capping limit: username_capping and/or account_capping. Please ask your account administrator to request a change via the EPFL support 1234@epfl.ch.

Basically when you submit a job you will see this kind of message:

Example without limit:#

$ srun --qos serial hostname
srun: [ESTIMATION] The estimated cost of this job is CHF 0.40
srun: ╭──────────────────────────────┬─────────────┬─────────────┬─────────────╮
srun:  [in CHF]                      Capping      Consumed     Queued ¹⁾ ²⁾│
srun: ├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
srun:  account : scitas-ge           20,000       899.4        0.4         srun: ├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
srun:  username : ncoudene           1            0.0          0.4         srun: ╰──────────────────────────────┴─────────────┴─────────────┴─────────────╯
srun: ¹⁾ Estimated cost of the queued jobs and this job
srun: ²⁾ Queued jobs costs are based on its walltime (option --time)
jst370

Example with limit:#

$ srun --qos serial --time 7-00:00:00 --cpus-per-task 70 hostname
srun: error: [ESTIMATION] The estimated cost of this job is CHF 64.68
srun: error: ╭──────────────────────────────┬─────────────┬─────────────┬─────────────╮
srun: error:  [in CHF]                      Capping      Consumed     Queued ¹⁾ ²⁾│
srun: error: ├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
srun: error:  account : scitas-ge           20,000       899.4        64.7        srun: error: ├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
srun: error:  🔥username : ncoudene         1            0.05         64.7        srun: error: ╰──────────────────────────────┴─────────────┴─────────────┴─────────────╯
srun: error: ¹⁾ Estimated cost of the queued jobs and this job
srun: error: ²⁾ Queued jobs costs are based on its walltime (option --time)
srun: error: [CAPPING]    🛑 You reached a capping.
srun: error: Unable to allocate resources: System submissions disabled

Memory limit#

In our clusters, we have enabled a limit for how much memory you can use per CPU that you have allocated to your job. Please see the official documentation on MaxMemPerCPU for more information about this setting.

For more information about the memory limit in our clusters, please read our documentation on Memory Allocation.

The limit is calculated like this :

  • [NODE_MEMORY] / [NODE_CPU_COUNT]

For example for a standard node on jed this limit is : 504000 / 72 = 7000

Basically it means that if you ask for a job more memory than you asked cpus (CPUS*MaxMemPerCPU), you will not be able to submit.

We add a warning at job submission time to help you see if you have asked for more memory than the limit allows:

helvetios # srun --mem 256000 hostname
srun: error: [MEMORY]     🛑 ERROR: The amount of memory you asked for exceeds MaxMemPerCPU limit (5333.0) for the partition standard.
srun: error: [MEMORY]     🛑 ERROR: Please increase the number of cores, or decrease the total RAM you ask.
srun: error: Unable to allocate resources: CPU count specification invalid

Capping

Keep in mind that your estimated job cost will be calculated based on the corrected cpu count (CPUS*MaxMemPerCPU).

CO2 equivalent emissions estimation on the clusters#

This section describes the efforts to calculate a CO2 equivalent (CO2eq) emissions estimation based on the cluster usage.

Current computation#

On SCITAS clusters you are presented with an estimated quantity of CO2eq of your computations.

The CO2eq estimation is currently only based on the power consumption of the machines. It is not an exact measure of the job consumption as it is based on an average power draw of the machines.

  • We get the power draw per rack in the different Data centers (CCT and INJ). This power draw include the compute nodes, but also the network, scratch storage, front nodes and admin nodes for each cluster.

  • This power draw is multiplied by the Power Usage Effectivness (PUE), also provided by the data center team, in order to account for the cooling of the machines.

  • Finally the energy consumption is converted into g CO2eq by using a conversion coefficent based on the energy mix from Switzerland.

Based on all these measurements we derive a single coefficient per cluster of g CO2 eq / core hour or g CO2 eq / gpu hour.

This data and the coefficent that are used can be found in the following Google Spreadsheet.

The currently used factors are:

Cluster W/(core or GPU) g CO2 eq / (core h or GPU h)
helvetios 8.19 1.11
izar 308.8 41.88
jed 6.93 1.19
kuma 251.83 43.19

Possible future improvements#

  • When calculating the average coefficients, we should take into account the CO2 eq emissions from the construction of the machines.

  • We are working on measuring the power consumption of jobs using hardware counters. This will account for CPU, GPU power consumption but will most likely use heuristics for RAM, storage and network power consumption.

  • We also want to evaluate how data storage on the shared storages (/home, /work, /archive, ...) impacts the CO2eq estimation.