Pay-per-use group account#
SCITAS pay-per-use group accounts are based on a pay-as-you-go pricing model, in other words, users pay based on how much they consume. This type of account is a group account linked to a lab or a project. Monthly invoices are sent to the head or the responsible of the group. Note that the fees are eligible for FNS and ERC as service costs and the invoices can be sent to FNS and ERC for reimbursement.
To create a pay-per-use account please ask the head of the laboratory/project to fill in the pay-per-use account creation form.
If you already have a pay-per-use account and would like to delete it, please submit a request.
Constraints and limitations#
Pay-per-use group accounts are imposed to a number of limitations. These extend to all users who submit jobs under the same account.
Resource limits#
The following limits are enforced to prevent excessive resource usage:
- Maximum wall clock time, for example, 3 days.
- Maximum number of nodes per job, for example, 8 nodes per job.
The exact limits depend on the Quality of Service (QOS) and the cluster where you submit your jobs.
The head of the laboratory/project can submit a request to raise these limits temporarily.
Budget limits#
To protect laboratories and projects against unwanted or accidental charges, we impose the following budget limits on pay-per-use group accounts:
- CHF 1'000.- per account and per month
When all members of an account have collectively consumed compute resources equivalent to the aforementioned threshold, within a calendar month, it will no longer be possible for the given account to consume any more resources until the end of the month.
The head of the laboratory/project can submit a request to:
- modify the budget limit of the entire account; and/or
- impose or modify budget limits on individual users.
Individual budget limits
No budget limits are imposed by default on individual users, unless explicitly requested.
Managing the SCITAS pay-per-use group account#
This section describes how to manage your SCITAS pay-per-use group account and
more precisely how to add and remove users. Throughout this page, we will use a
dummy account called hpc-example
. Of course, you will have to adapt the group
name with the one of your lab.
Administrator authorizations
Everything done in this section of the documentation requires you to be an administrator of the group. To verify if it is indeed the case, go to the group page:
where [group_name]
is the name of your group, and check that your name
appears in the "Administrators/Administrateurs" section.
If you are not an administrator, but would like to be added or removed from an account, please contact directly one of the administrators of the group.
Adding users to the SCITAS pay-per-use group account#
To add one or multiple users, you need to go to your account main page:
where [group_name]
is the name of your group, e.g. hpc-example
. This is
the dashboard allowing to manage your account.
On the group page, select the section "Members":
On this new page, you will be able to add a single person, a service, a group, a organizational unit, or multiple people at once, by clicking the "Add Members" button:
Adding a group
Please note that if you add a group my-group
, all members of that group
will be able to use the pay-per-use group account.
Usually, only the "Add > Person" and "Add > Multiple" will be used. In those two fields, you can input either the first and/or last name, or the SCIPER number of the person(s) you wish to add. Note that it must uniquely identify the user.
Once you have entered the identifying information in the field, hit the "Enter" key. One or more associated people will appear with the corresponding check boxes. Check the box next to the name(s) of the person(s) you want to add. A button "Add" should appear. Please note that multiple names can appear if the information provided is not unique.
Removing users from the SCITAS pay-per-use group account#
User removal
Before removing a user from your account, please make sure all of their data
is correctly backed up. The user's /home
directory will be removed if
there are no other accounts associated with them. Note however that the
/home
directory of every user is backed up and a copy is kept for six
months. Please see File systems for more
information.
To remove users select the users you want to remove and a "Remove selected members" will appear:
Billing of the pay-per-use group account#
Basic information#
When you are using our clusters with a pay-per-use group account account all your jobs with some exceptions will be submitted to billing for the account concerned.
Exceptions:
-
Jobs which failed because of a node failure NODE_FAIL
-
Jobs in non-billed QOS and/or partition. For example, qos
debug
onjed
Price list#
Billing is based on the SCITAS operational costs. The current unit costs and the prices for using all of the SCITAS services are described in detail in the pricing documentation.
Billing on GPU clusters#
On GPU clusters the cpu.hour
unit is not billed.
For example, reserving 1 GPU and 20 cores on kuma would be billed the
same as reserving 1 GPU and 1 core, according to the gpu.hour
unit.
Responsible resource reservation
To provide a high resource availability to our users, please reserve only the number of cores (cpu) and/or gpus that you actually need for your compute task.
Checking your resource consumption and costs#
You can use sausage
to display the amount of resources you have already
consumed and the current costs.
The sausage
tool also shows an estimate of CO2 equivalent emissions
on the clusters. You can read more about this in the section describing the
emissions calculation.
# sausage -h
usage: Sausage [-h] [-v] [-V] [-u USERNAME] [-a] [-A ACCOUNT] [-s START]
[-e END] [-x] [-t]
SCITAS Account Usage.
options:
-h, --help show this help message and exit
-v, --verbose Verbose (default: False)
-V, --version Version
-u USERNAME, --username USERNAME
If not provided whoami is considered (default: None)
-a, --all all users from an account are printed (default: False)
-A ACCOUNT, --account ACCOUNT
Prints account consumption per cluster (default: None)
-s START, --start START
Start date - format YYYY-MM-DD (default: 2023-06-01)
-e END, --end END End date (included) - format YYYY-MM-DD (default:
2023-06-30)
-x, --csv Print result in csv style (default: False)
-t, --total Print result with total (default: False)
By default, sausage
displays the information for your user for the current
month.
jed # date; sausage
Tue Jun 20 09:29:13 CEST 2023
╭────────────────────────────────────────────────────────────────────╮
│ USERNAME : ncoudene │
│ Global usage from 2023-06-01 to 2023-06-30 │
│╭──────────┬──────────┬──────┬───────┬───────┬─────────┬───────────╮│
││Account │Cluster │# jobs│GPU [h]│CPU [h]│eCO₂ [kg]│Costs [CHF]││
│├──────────┼──────────┼──────┼───────┼───────┼─────────┼───────────┤│
││scitas-ge │jed │ 4│ 0.0│ 0.0│ 0.0│ 0.0││
││scitas-ge │helvetios │ 1│ 0.0│ 0.0│ 0.0│ 0.0││
│╰──────────┴──────────┴──────┴───────┴───────┴─────────┴───────────╯│
╰────────────────────────────────────────────────────────────────────╯
sausage v0.12.1.2
You can browse your past consumption with some arguments:
jed # date; sausage --start 2023-01-01 --total
Tue Jun 20 09:31:49 CEST 2023
╭────────────────────────────────────────────────────────────────────╮
│ USERNAME : ncoudene │
│ Global usage from 2023-01-01 to 2023-06-30 │
│╭──────────┬──────────┬──────┬───────┬───────┬─────────┬───────────╮│
││Account │Cluster │# jobs│GPU [h]│CPU [h]│eCO₂ [kg]│Costs [CHF]││
│├──────────┼──────────┼──────┼───────┼───────┼─────────┼───────────┤│
││scitas-ge │helvetios │ 182│ 0.0│ 1.1│ 0.0│ 0.0││
││scitas-ge │jed │ 123│ 0.0│ 82.0│ 0.1│ 0.5││
││scitas-ge │izar │ 6│ 0.0│ 0.0│ 0.0│ 0.0││
│╰──────────┴──────────┴──────┴───────┴───────┴─────────┴───────────╯│
│ Walltime GPU h 0.0 │
│ Walltime CPU h 83.2 │
│ Number of jobs 311.0 │
│ Est. carbon footprint kg 0.1 │
│ Costs CHF 0.5 │
╰────────────────────────────────────────────────────────────────────╯
sausage v0.12.1.2
Running Jobs with capping enabled#
By default, all jobs are submitted with capping. This means that when you submit your jobs Slurm will do a calculation to estimate if you will exceed a certain limit. If so, your job will not be submitted.
The calculation is done in five parts. It will:
- Estimate the cost of the soon-to-be submitted job:
job_estimation
- Verify if your username and account has a capping limit, if so we will take
note of both
username_capping
andaccount_capping
- Calculate what your user
username_consumed
and your accountaccount_consumed
has consumed in our clusters. This information comes from sausage. - Calculate what you are expected to use based on the jobs your username
username_queued
and your accountaccount_queued
have queued in all our clusters. - Verify if you have exceed a limit:
username_capping - username_consumed - username_queued - job_estimation <= 0
oraccount_capping - account_consumed - account_queued - job_estimation <= 0
Default behaviour
By default, if you have exceeded a limit, your job will NOT be submitted.
Support
If you need to change your capping limit: username_capping
and/or
account_capping
. Please ask your account administrator to request a change
via the EPFL support 1234@epfl.ch.
Basically when you submit a job you will see this kind of message:
Example without limit:#
$ srun --qos serial hostname
srun: [ESTIMATION] The estimated cost of this job is CHF 0.40
srun: ╭──────────────────────────────┬─────────────┬─────────────┬─────────────╮
srun: │ [in CHF] │ Capping │ Consumed │ Queued ¹⁾ ²⁾│
srun: ├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
srun: │ account : scitas-ge │ 20,000 │ 899.4 │ 0.4 │
srun: ├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
srun: │ username : ncoudene │ 1 │ 0.0 │ 0.4 │
srun: ╰──────────────────────────────┴─────────────┴─────────────┴─────────────╯
srun: ¹⁾ Estimated cost of the queued jobs and this job
srun: ²⁾ Queued jobs costs are based on its walltime (option --time)
jst370
Example with limit:#
$ srun --qos serial --time 7-00:00:00 --cpus-per-task 70 hostname
srun: error: [ESTIMATION] The estimated cost of this job is CHF 64.68
srun: error: ╭──────────────────────────────┬─────────────┬─────────────┬─────────────╮
srun: error: │ [in CHF] │ Capping │ Consumed │ Queued ¹⁾ ²⁾│
srun: error: ├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
srun: error: │ account : scitas-ge │ 20,000 │ 899.4 │ 64.7 │
srun: error: ├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
srun: error: │ 🔥username : ncoudene │ 1 │ 0.05 │ 64.7 │
srun: error: ╰──────────────────────────────┴─────────────┴─────────────┴─────────────╯
srun: error: ¹⁾ Estimated cost of the queued jobs and this job
srun: error: ²⁾ Queued jobs costs are based on its walltime (option --time)
srun: error: [CAPPING] 🛑 You reached a capping.
srun: error: Unable to allocate resources: System submissions disabled
Memory limit#
In our clusters, we have enabled a limit for how much memory you can use per CPU that you have allocated to your job. Please see the official documentation on MaxMemPerCPU for more information about this setting.
For more information about the memory limit in our clusters, please read our documentation on Memory Allocation.
The limit is calculated like this :
[NODE_MEMORY] / [NODE_CPU_COUNT]
For example for a standard node on jed
this limit is : 504000 / 72 = 7000
Basically it means that if you ask for a job more memory than you asked cpus
(CPUS*MaxMemPerCPU)
, you will not be able to submit.
We add a warning at job submission time to help you see if you have asked for more memory than the limit allows:
helvetios # srun --mem 256000 hostname
srun: error: [MEMORY] 🛑 ERROR: The amount of memory you asked for exceeds MaxMemPerCPU limit (5333.0) for the partition standard.
srun: error: [MEMORY] 🛑 ERROR: Please increase the number of cores, or decrease the total RAM you ask.
srun: error: Unable to allocate resources: CPU count specification invalid
Capping
Keep in mind that your estimated job cost will be calculated based on the
corrected cpu count (CPUS*MaxMemPerCPU)
.
CO2 equivalent emissions estimation on the clusters#
This section describes the efforts to calculate a CO2 equivalent (CO2eq) emissions estimation based on the cluster usage.
Current computation#
On SCITAS clusters you are presented with an estimated quantity of CO2eq of your computations.
The CO2eq estimation is currently only based on the power consumption of the machines. It is not an exact measure of the job consumption as it is based on an average power draw of the machines.
-
We get the power draw per rack in the different Data centers (CCT and INJ). This power draw include the compute nodes, but also the network, scratch storage, front nodes and admin nodes for each cluster.
-
This power draw is multiplied by the Power Usage Effectivness (PUE), also provided by the data center team, in order to account for the cooling of the machines.
-
Finally the energy consumption is converted into g CO2eq by using a conversion coefficent based on the energy mix from Switzerland.
Based on all these measurements we derive a single coefficient per cluster of g CO2 eq / core hour or g CO2 eq / gpu hour.
This data and the coefficent that are used can be found in the following Google Spreadsheet.
The currently used factors are:
Cluster | W/(core or GPU) | g CO2 eq / (core h or GPU h) |
---|---|---|
helvetios | 8.19 | 1.11 |
izar | 308.8 | 41.88 |
jed | 6.93 | 1.19 |
kuma | 251.83 | 43.19 |
Possible future improvements#
-
When calculating the average coefficients, we should take into account the CO2 eq emissions from the construction of the machines.
-
We are working on measuring the power consumption of jobs using hardware counters. This will account for CPU, GPU power consumption but will most likely use heuristics for RAM, storage and network power consumption.
-
We also want to evaluate how data storage on the shared storages (
/home
,/work
,/archive
, ...) impacts the CO2eq estimation.