Skip to content

Helvetios#

Helvetios: Helvetios down - run your jobs on Jed

Helvetios is currently fully unavailable due to a major network failure. An on-site intervention is required to restore a minimal service.

If you have already copied your data to the central storage, you can continue running your jobs on the Jed cluster (jed.hpc.epfl.ch) using the new academic partition.

How to proceed

Submit your jobs on Jed using the academic partition, either:

  • On the command line:
--partition=academic
  • In your submission script:
#SBATCH --partition=academic

Important note about the software stack

If you are using our software stack, please recompile your code on Jed, as the software stack available there is more recent and module versions may differ from Helvetios.

We will keep you informed as soon as Helvetios is restored to a minimal operational state to allow data copying as soon as possible.

Thank you for your understanding and cooperation.

Helvetios: Severe cluster issues

Our aging Helvetios academic cluster regularly experiences severe storage and network issues as the hardware starts failing.

We already had to isolate Helvetios from our central storage system earlier this year to mitigate the impact of these issues on the other clusters.

All data stored on Helvetios is at risk of being lost in the event of a fatal failure!

Please ensure you have a copy of ALL important data on a separate, reliable storage system. Do NOT rely solely on this cluster to store critical files.

We highly recommend copying important data currently stored on Helvetios to our central storage system (accessible from all our other clusters).

We remind you that Helvetios runs on obsolete hardware that is no longer supported by the vendors. SCITAS can only provide best-effort support for its maintenance. We are working on the next CPU computing solution for students and courses.

Thank you for your understanding and cooperation.

Helvetios: Back to production in degraded state!

The Helvetios cluster is now available again, but in a reduced and isolated configuration. Please read carefully the key changes and actions required.

Current Status

  • The cluster is back online with only 24 nodes currently provisioned.
  • Helvetios is no longer connected to the central storage, meaning:
  • /home, /scratch, and /work are now local to the cluster
  • The /work filesystem is no longer shared
  • All data previously stored in /scratch has been lost and cannot be recovered.

We plan to gradually increase the number of available nodes as soon as we confirm system stability.

Why These Changes?

  • Helvetios is based on unsupported, obsolete hardware, and SCITAS can only provide best-effort support for its maintenance.
  • Recent network issues on Helvetios have caused disruptions and performance degradation across all production clusters by impacting the central storage.
  • To protect the integrity of the production environment, we had to isolate Helvetios from shared storage.

What You Need to Do

  • Manually copy your SSH keys
  • Data previously stored in /home or /work (when it was part of the central storage) will need to be restored manually.

We understand this situation may cause inconvenience, and we appreciate your patience as we continue to maintain access to this legacy system under challenging conditions.

Happy computing! 😊🚀

Annual SCITAS maintenance

This communication is of significant importance and may affect your work. We strongly recommend dedicating time to thoroughly read its content.

We are approaching our forthcoming annual maintenance period, scheduled from February 5 to February 19, 2024. This maintenance is essential for enhancing our services and includes the following key upgrades: