Kuma#
Research Cluster
This cluster is for pay-per-use accounts. Master students and courses cannot use Kuma. For the educational GPU cluster check Izar.
Useful info#
Connecting to the clusters#
To connect to the cluster you should:
Here's the list of current fingerprints you should expect when connecting to this cluster:
ECDSA
MD5:94:cb:c8:73:22:30:70:ea:53:36:9e:4b:fd:33:e0:b6
SHA256:vpM/BzmJapiUU3o6hbm2zlKFN93D8QE3xObVdh8x4hM
ED25519
MD5:46:48:27:b0:b3:07:a8:68:ca:a5:4c:cf:1a:c2:c6:c4
SHA256:VU3simBjo2CoUePsABLhZ/HpW+anz231EU3rfurZDFo
RSA
MD5:14:41:97:e2:16:33:a9:cd:d9:2e:07:37:a6:39:31:ae
SHA256:u3v9urAmgx03w1xUZR6WOxyXAoDoyTcBbbiYbR4IeMc
QOS#
The standard QOS are:
normalfor jobs using up to 8 nodes, with a time limit of 3 days. This is the default;longfor jobs using up to 8 nodes, with a time limit of 7 days;buildfor compiling your codes, with up to 16 cores on 1 node, 0 GPU and a time limit of 4 hours;debugfor debugging jobs on up to 2 nodes, with a high priority and a time limit of 1 hour.
Choose one with -q <qos> or --qos <qos>.
Partitions#
There are 3 partitions on Kuma, to differentiate nodes based on GPU type and use case:
h100, to use the Nvidia H100 GPU nodes (with FP64 capabilities). You can request up to 16 cores per GPU;l40s, to use the Nvidia L40s GPU nodes (with FP32 capabilities). You can request up to 8 cores per GPU;mig12gb, ormig24gbto use the MIG instances on the H100 GPU nodes. You request up to 2 or 5 cores per MIG (for12gband24gbrespectively).
There is no default partition. You have to choose one.
Choose one with -p <partition> or --partition <partition>.
The MIG have a reduced compute and memory capacity (roughly 10 GB of VRAM). They are ideal for debugging or coding sessions, as well as any other job that is too small to effectively use one full GPU.
Automatic assignment of RAM per core
We automatically assign 5900 MB of RAM per CPU core associated with the job. You cannot
ask for more RAM than this, even by specifying --mem.
Hardware characteristics#
This cluster has the following configuration:
| Type | Count | Model | CPU | Memory | Storage | Naming | GPU # | GPU Model |
|---|---|---|---|---|---|---|---|---|
| Frontend | 2 | ThinkSystem SR675 V3 Version: 03 | AMD EPYC 9334 @ 2.7 GHz | 384 GB | 6.4 TB (NVMe) | kuma[1-2] | NA | NA |
| Compute node H100 | 84 | ThinkSystem SR675 V3 Version: 03 | AMD EPYC 9334 @ 2.7 GHz | 371 GB | 6.4 TB (NVMe) | kh[001-084] | 4 | NVIDIA H100 94GB |
| Compute node L40s | 20 | ThinkSystem SR675 V3 Version: 03 | AMD EPYC 9334 @ 2.7 GHz | 371 GB | 7.6 TB x3 (NVMe) | kl[001-020] | 8 | NVIDIA L40S 48GB |
| Admin server | 2 | ThinkSystem SR630 V3 Version: 06 | Intel(R) Xeon(R) Silver 4416+ CPU @ 2.00 GHz | 256 GB | 1920 GB (SCSI) | kadmin[1-2] | NA | NA |
| Proxy server | 1 | ThinkSystem SR630 V3 Version: 06 | Intel(R) Xeon(R) Silver 4416+ CPU @ 2.00 GHz | 256 GB | 1920 GB (SCSI) | ksmartproxy1 | NA | NA |




