Skip to content

Summer 2025 Maintenance Complete - Full Infrastructure Restored#

We are pleased to inform you that the annual site-wide maintenance has been successfully completed.
All SCITAS systems are now fully operational.

Due to the OS and software stack upgrades, you will likely need to recompile your applications.
If you use module versioning in your job scripts, please update them accordingly.

Please note that all jobs currently in the Jed's queue are in the JobHeldUser state.
Each user may release their own jobs manually using the following command: scontrol release <jobid>

Below is a summary of the key updates and improvements performed during the maintenance:

Base System Enhancements and Cluster Firmware Updates (532 compute nodes and admin servers)

  • OS upgrade: RHEL 9.0 → RHEL 9.4
  • Mellanox OFED: Updated to the latest drivers
  • Packages and tools: Refreshed to stable versions
  • System cleanup: Obsolete packages removed, general tidying performed
  • All login and admin nodes updated
  • UEFI, BMC and network card firmware updated

Jed InfiniBand Integration (44 compute nodes)

  • New rack installed with InfiniBand
  • IB cards and switches configured
  • Stack integration completed
    • To use the new InfiniBand-enabled nodes, please specify the infiniband Slurm partition (-p infiniband).

Software Stack Update

  • Software stack on Kuma recompiled on the new OS
  • Update of software stack on Jed to match Kuma
  • Libraries and tools updated for performance and compatibility

Central Storage (Spectrum Scale)

  • NSD servers and disk enclosures upgraded
  • GPFS version updated
  • Core switches updated
  • I/O parameters fine-tuned based on IBM best practices.

Network Infrastructure Improvements

  • 32 switches aligned to a unified version
  • Security rules reinforced
  • Monitoring tools upgraded for better diagnostics