New billing scheme and usage capping#

Introducing RAM memory billing: A fairer approach to compute node usage#

Historically, SCITAS has only billed for the fraction of the CPU that is allocated to a simulation. While this metric often provides valuable insights into the processing power required by a user, it does not fully capture the complete picture of resource consumption. For example, some simulations may require a lot of memory, while performing their task serially, i.e. using only one CPU core. This leads to situations where a certain fraction of the node is allocated for this particular job, while it is billed only based on one CPU core. This created an unfair situation between the users as the billing did not take into account the actual fraction of the compute node that is utilized.

As a solution to this problem, we have introduced an update to our allocation policy. As of today, we have activated an allocation scheme that reflects the actual fraction of the nodes that is effectively used. In practice, each job will be allocated a number of cores that corresponds to the amount of allocated memory. For example, on a compute node with 72 cores and 512GB of RAM, each core is assigned to an average amount of 512GB/72 = 7GB of RAM. Then, a job asking for one core and 27GB of memory will be effectively assigned (and billed) four cores. In reality, this corresponds to the maximum between the asked number of cores and the RAM-equivalent number of cores. Whenever you submit a job for which the number of cores must be adapted, you will see a header message similar to the following:

[MEMORY] ⚠  WARNING: The amount of memory you asked for corresponds to 19 cpus.
[MEMORY] ⚠  WARNING: For this reason, your job will be assigned 19 cpus instead of 1.

In this case, it warns the user that due to the amount of memory that was asked, a number of 19 cores will be allocated to the job instead of one.

The activation of RAM memory allocation will have several benefits for both our users and us as a service provider. Firstly, it allows users to have a better understanding of their resource utilization and associated costs. By accurately reflecting the memory consumed by their applications or processes, users can make informed decisions about optimizing their workloads and managing their expenses more effectively.

Secondly, this change enables us to more accurately allocate resources and ensure optimal performance for all users. By considering both CPU cores and RAM memory, we can ensure that compute nodes are efficiently utilized, preventing over-allocation or underutilization of resources. This ultimately leads to a more balanced and reliable computing environment.

We understand that this update may raise questions or concerns among our user base. We want to assure you that we have taken several factors into consideration when implementing this change. Our aim is not to increase costs arbitrarily, but rather to provide a fair and accurate representation of resource usage.

In the past few weeks, we have monitored the resources used by the simulations and we realized that some users asked for an amount of memory that was far greater than their actual needs. For this reason, we kindly ask you to make sure that the amount of memory you require in your job scripts reflects your real needs. We have prepared an article explaining how you can achieve that.

Introducing resource capping: Prevent unexpected high expenses on HPC platforms#

We understand that on a parallel cluster it can be all too easy to inadvertently consume an excessive amount of resources, resulting in unexpectedly high expenses. To address this issue, we have developed a solution that caps user/group expenses based on a maximum value.

After testing this solution on a voluntary basis, we will now enable usage capping for all accounts. We have set a personalized maximum value based on your typical resource consumption. You can check this limit with the message displayed when you submit a job. We will also add the capping information to the sausage tool in the near future. Once you reach the maximum amount, you won't be able to launch new simulations. With this change, every submitted job will print an header message of the form:

[ESTIMATION] The estimated cost of this job is CHF 7.66
[CAPPING] All users of the account <your_account> have consumed 1312.14 CHF
[CAPPING] In addition, based on queued and running jobs all users of the account <your_account> will consume up to 469.41 CHF
[CAPPING] Your username <username> has consumed 0.00 CHF
[CAPPING] In addition, based on queued and running jobs your username <username> will consume up to 0.00 CHF
╭──────────────────────────────┬─────────────┬─────────────┬─────────────╮
│ [in CHF]                     │ Capping     │ Consumed    │ Queued      │
├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
│ account : <your_account>     │ 10,000      │ 1,312.15    │ 469.45      │
├──────────────────────────────┼─────────────┼─────────────┼─────────────┤
│ username : <username>        │ 0           │ 0.0         │ 0.0         │
╰──────────────────────────────┴─────────────┴─────────────┴─────────────╯

This message shows the capping limit and an estimation of your job cost based on the Slurm parameters you provided. Furthermore, your current consumption as well as your group's is printed.

Of course, we understand that each user's requirements may differ, and that's why we will implement a web interface where PIs can adjust these limits on a user/group basis. In the meantime, if this limit is too low or too high for you, please do not hesitate to open a ticket to 1234@epfl.ch and we will adjust it to the level of your choosing.

Note

More information about the job header messages can be found in the dedicated article.