Billing#
Basic information#
When you are using our clusters with a paid account all your jobs with some exceptions will be submitted to billing for the account concerned.
Exceptions:
-
Jobs which failed because of a node failure NODE_FAIL
-
Jobs in non-billed QOS and/or partition. For example, qos
debug
onjed
Sausage#
You can use sausage
to display what you have already consumed.
# sausage -h
usage: Sausage [-h] [-v] [-V] [-u USERNAME] [-a] [-A ACCOUNT] [-s START]
[-e END] [-x] [-t]
SCITAS Account Usage.
options:
-h, --help show this help message and exit
-v, --verbose Verbose (default: False)
-V, --version Version
-u USERNAME, --username USERNAME
If not provided whoami is considered (default: None)
-a, --all all users from an account are printed (default: False)
-A ACCOUNT, --account ACCOUNT
Prints account consumption per cluster (default: None)
-s START, --start START
Start date - format YYYY-MM-DD (default: 2023-06-01)
-e END, --end END End date (included) - format YYYY-MM-DD (default:
2023-06-30)
-x, --csv Print result in csv style (default: False)
-t, --total Print result with total (default: False)
By default, sausage
displays the information for your user for the current month.
jed # date; sausage
Tue Jun 20 09:29:13 CEST 2023
╭────────────────────────────────────────────────────────────────────╮
│ USERNAME : ncoudene │
│ Global usage from 2023-06-01 to 2023-06-30 │
│╭──────────┬──────────┬──────┬───────┬───────┬─────────┬───────────╮│
││Account │Cluster │# jobs│GPU [h]│CPU [h]│eCO₂ [kg]│Costs [CHF]││
│├──────────┼──────────┼──────┼───────┼───────┼─────────┼───────────┤│
││scitas-ge │jed │ 4│ 0.0│ 0.0│ 0.0│ 0.0││
││scitas-ge │helvetios │ 1│ 0.0│ 0.0│ 0.0│ 0.0││
│╰──────────┴──────────┴──────┴───────┴───────┴─────────┴───────────╯│
╰────────────────────────────────────────────────────────────────────╯
sausage v0.12.1.2
You can browse your past consumption with some arguments:
jed # date; sausage --start 2023-01-01 --total
Tue Jun 20 09:31:49 CEST 2023
╭────────────────────────────────────────────────────────────────────╮
│ USERNAME : ncoudene │
│ Global usage from 2023-01-01 to 2023-06-30 │
│╭──────────┬──────────┬──────┬───────┬───────┬─────────┬───────────╮│
││Account │Cluster │# jobs│GPU [h]│CPU [h]│eCO₂ [kg]│Costs [CHF]││
│├──────────┼──────────┼──────┼───────┼───────┼─────────┼───────────┤│
││scitas-ge │helvetios │ 182│ 0.0│ 1.1│ 0.0│ 0.0││
││scitas-ge │jed │ 123│ 0.0│ 82.0│ 0.1│ 0.5││
││scitas-ge │izar │ 6│ 0.0│ 0.0│ 0.0│ 0.0││
│╰──────────┴──────────┴──────┴───────┴───────┴─────────┴───────────╯│
│ Walltime GPU h 0.0 │
│ Walltime CPU h 83.2 │
│ Number of jobs 311.0 │
│ Est. carbon footprint kg 0.1 │
│ Costs CHF 0.5 │
╰────────────────────────────────────────────────────────────────────╯
sausage v0.12.1.2
Running Jobs with capping enabled#
By default, all jobs are submitted with capping. This means that when you submit your jobs Slurm will do a calculations to estimate if you will exceed a certain limit. If so, your job will not be submitted.
Basically the submitting your calculation is done in five parts:
-
Estimate the cost of the soon-to-be submitted job:
job_estimation
-
Verify if your username and account has a capping limit, if so we will take note of both
username_capping
andaccount_capping
-
Calculate what your user consumed already
username_consumed
and your accountaccount_consumed
in our clusters. This information comes from sausage. -
Calculate what you are expected to use based on the jobs your username
username_queued
and your accountaccount_queued
have queued in all our clusters. -
Verify if you have exceed a limit:
username_capping - username_consumed - username_queued - job_estimation <= 0
oraccount_capping - account_consumed - account_queued - job_estimation <= 0
Default behaviour
By default, if any of these steps fail, your job will be submitted anyway.
Support
If you need to change your capping limit: username_capping
and/or account_capping
. Please ask your account administrator to request a change via the EPFL support 1234@epfl.ch.
Basically when you submit a job you will see this kind of message:
- Example without limit:
$ srun --qos serial hostname
srun: info: [ESTIMATION] The estimated cost of this job is CHF 0.00
srun: info: [CAPPING] All users of the account scitas-ge have consumed 715.41 CHF
srun: info: [CAPPING] In addition, based on queued and running jobs all users of the account scitas-ge will consume up to 58.74 CHF
srun: info: [CAPPING] Your username ncoudene have consumed 0.00 CHF
srun: info: [CAPPING] In addition, based on queued and running jobs your username ncoudene will consume up to 0.00 CHF
srun: info: ╭──────────────────────────────┬─────────────┬─────────────┬─────────────╮
srun: info: │ [in CHF] │ Capping │ Consumed │ Queued │
srun: info: ├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
srun: info: │ account : scitas-ge │ 10,000 │ 715.45 │ 58.75 │
srun: info: ├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
srun: info: │ username : ncoudene │ 0 │ 0.0 │ 0.0 │
srun: info: ╰──────────────────────────────┴─────────────┴─────────────┴─────────────╯
h042
- Example with limit:
$ srun --qos serial hostname
srun: error: [ESTIMATION] The estimated cost of this job is CHF 0.00
srun: error: [CAPPING] All users of the account scitas-ge have consumed 715.41 CHF
srun: error: [CAPPING] In addition, based on queued and running jobs all users of the account scitas-ge will consume up to 29.30 CHF
srun: error: [CAPPING] Your username ylopes have consumed 421.83 CHF
srun: error: [CAPPING] In addition, based on queued and running jobs your username ylopes will consume up to 0.00 CHF
srun: error: ╭──────────────────────────────┬─────────────┬─────────────┬─────────────╮
srun: error: │ [in CHF] │ Capping │ Consumed │ Queued │
srun: error: ├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
srun: error: │ account : scitas-ge │ 10,000 │ 715.45 │ 29.35 │
srun: error: ├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
srun: error: │ 🔥username : ylopes │ 420 │ 421.85 │ 0.0 │
srun: error: ╰──────────────────────────────┴─────────────┴─────────────┴─────────────╯
srun: error: [CAPPING] 🛑 You reached a capping.
srun: error: Unable to allocate resources: Unspecified error
Memory limit#
In our cluster, we put some limit to the memory you can use by cpu allocated : MaxMemPerCPU
For more information, please read our documentation on Memory Allocation.
The limit is calculated like this :
- [NODE_MEMORY] / [NODE_CPU_COUNT]
For example for a standard node on jed
this limit is : 504000 / 72 = 7000
Basically it means that if you ask for a job more memory than you asked cpus (CPUS*MaxMemPerCPU) , slurm will change the cpu count of the job.
We add some warning at job submission to help you see if you have asked more memory than the limit allowed :
jed # srun --mem 256000 hostname
[...]
srun: info: [MEMORY] ⚠️ WARNING: The amount of memory you asked for corresponds to 36 cpus.
srun: info: [MEMORY] ⚠️ WARNING: For this reason, your job will be assigned 36 cpus instead of 1.0.
[...]
jst003
jed # srun --mem-per-cpu 7200 hostname
[...]
srun: info: [MEMORY] ⚠️ WARNING: The amount of memory you asked for corresponds to 2 cpus.
srun: info: [MEMORY] ⚠️ WARNING: For this reason, your job will be assigned 2 cpus instead of 1.0.
[...]
jst003
Capping
Keep in mind that your estimated job cost will be calculated based on the corrected cpu count.