Skip to content

Clusters#

Helvetios: Back to production in degraded state!

The Helvetios cluster is now available again, but in a reduced and isolated configuration. Please read carefully the key changes and actions required.

Current Status

  • The cluster is back online with only 24 nodes currently provisioned.
  • Helvetios is no longer connected to the central storage, meaning:
  • /home, /scratch, and /work are now local to the cluster
  • The /work filesystem is no longer shared
  • All data previously stored in /scratch has been lost and cannot be recovered.

We plan to gradually increase the number of available nodes as soon as we confirm system stability.

Why These Changes?

  • Helvetios is based on unsupported, obsolete hardware, and SCITAS can only provide best-effort support for its maintenance.
  • Recent network issues on Helvetios have caused disruptions and performance degradation across all production clusters by impacting the central storage.
  • To protect the integrity of the production environment, we had to isolate Helvetios from shared storage.

What You Need to Do

  • Manually copy your SSH keys
  • Data previously stored in /home or /work (when it was part of the central storage) will need to be restored manually.

We understand this situation may cause inconvenience, and we appreciate your patience as we continue to maintain access to this legacy system under challenging conditions.

Happy computing! 😊🚀

Kuma Cluster Full Production & Pricing – Nov 1st

We are excited to announce the successful completion of the beta testing phase for the Kuma GPU cluster, and we are preparing to enter full production starting from November 1st, 2024. Your participation in the beta phase has been invaluable, with a total of approximately 450,000 GPU hours of calculation jobs executed. This extensive testing allowed us to identify and resolve various hardware and software issues, ensuring that Kuma is largely ready for production.

Kuma Beta Opening

After a successful restricted beta with more than 80'000 jobs submitted, we are pleased to announce that Kuma, the new GPU-based cluster, is available for testing starting now! This marks an important milestone as we transition from the Izar cluster, which will soon be reassigned to educational purposes, to the much more powerful Kuma cluster. You can now connect to the login node at kuma.hpc.epfl.ch to begin testing your codes.

Annual SCITAS maintenance

This communication is of significant importance and may affect your work. We strongly recommend dedicating time to thoroughly read its content.

We are approaching our forthcoming annual maintenance period, scheduled from February 5 to February 19, 2024. This maintenance is essential for enhancing our services and includes the following key upgrades:

Downfall vulnerability

The downfall vulnerability, identified as CVE-2022-40982, enables a user to access and steal data from other users who share the same computer. It is found in most Intel CPUs starting from the 6th generation (Skylake) up to the 11th generation (Tiger Lake) included. For instance, a malicious app obtained from an app store could use the Downfall attack to steal sensitive information like passwords, encryption keys, and private data such as banking details, personal emails, and messages.

Jed frontend has to be rebooted due to a power issue

This morning, around 6h30, the Jed frontend was turned off due to an unexpected power issue on the direct power line. Because of this, we had to reboot the frontend, which may have caused some loss of connections.

All the running jobs as well as the ones in the queue were not affected.

We are currently investigating the cause of this problem. We apologize for the inconvenience it may have caused you.