Skip to content

Kuma#

Kuma Cluster

Research Cluster

This cluster is for pay-per-use accounts. Master students and courses cannot use Kuma. For the educational GPU cluster check Izar.

Useful info#

Connecting to the clusters#

To connect to the cluster you should:

ssh <username>@kuma.hpc.epfl.ch

Here's the list of current fingerprints you should expect when connecting to this cluster:

ECDSA
    MD5:c2:0d:da:2e:5a:29:fa:4e:b3:4c:b3:c1:9f:53:fb:d6
    SHA256:Y4DicuOpInMIFShZXHfSNoLX++Wx6M4jRXOX2H+8Ows
ED25519
    MD5:3e:d1:e9:4e:9b:67:13:bd:09:ca:29:3a:69:82:18:16
    SHA256:h24nEbxUtKIbXdtWgZLjo7YgRYmaSPOfPmLAHgnaF7E
RSA
    MD5:f7:db:13:73:15:d1:a8:5d:a3:7c:a2:56:b3:e9:35:5e
    SHA256:qybRVV995yOb3OlI+b5aUeuXc3AicftyTPQ0Tz4miTk

QOS#

The standard QOS are:

  • normal for jobs using up to 8 nodes, with a time limit of 3 days. This is the default;
  • long for jobs using up to 8 nodes, with a time limit of 7 days;
  • build for compiling your codes, with up to 16 cores on 1 node, 0 GPU and a time limit of 4 hours;
  • debug for debugging jobs on up to 2 nodes, with a high priority and a time limit of 1 hour.

Choose one with -q <qos> or --qos <qos>.

Partitions#

There are 2 partitions on Kuma, to differentiate nodes based on GPU type:

  • h100, to use the Nvidia H100 GPU nodes (with FP64 capabilities);
  • l40s, to use the Nvidia L40s GPU nodes (with FP32 capabilities);

There is no default partition. You have to choose one.

Choose one with -p <partition> or --partition <partition>.

Hardware characteristics#

This cluster has the following configuration:

Type Count Model CPU Memory Storage Naming GPU # GPU Model
Frontend 2 ThinkSystem SR675 V3 Version: 03 AMD EPYC 9334 @ 2.7 GHz 384 GB 6.4 TB (NVMe) kuma[1-2] NA NA
Compute node H100 84 ThinkSystem SR675 V3 Version: 03 AMD EPYC 9334 @ 2.7 GHz 384 GB 6.4 TB (NVMe) kh[001-084] 4 NVIDIA H100 94GB
Compute node L40s 20 ThinkSystem SR675 V3 Version: 03 AMD EPYC 9334 @ 2.7 GHz 384 GB 7.6 TB x3 (NVMe) kl[001-020] 8 NVIDIA L40S 48GB
Admin server 2 ThinkSystem SR630 V3 Version: 06 Intel(R) Xeon(R) Silver 4416+ CPU @ 2.00 GHz 256 GB 1920 GB (SCSI) kadmin[1-2] NA NA
Proxy server 1 ThinkSystem SR630 V3 Version: 06 Intel(R) Xeon(R) Silver 4416+ CPU @ 2.00 GHz 256 GB 1920 GB (SCSI) ksmartproxy1 NA NA

H100#

Kuma H100 GPU node

L40S#

Kuma L40S GPU node

Admin servers#

Kuma admin servers

Frontal node#

Kuma frontal node