Kuma#
Research Cluster
This cluster is for pay-per-use accounts. Master students and courses cannot use Kuma. For the educational GPU cluster check Izar.
Useful info#
Connecting to the clusters#
To connect to the cluster you should:
Here's the list of current fingerprints you should expect when connecting to this cluster:
ECDSA
MD5:c2:0d:da:2e:5a:29:fa:4e:b3:4c:b3:c1:9f:53:fb:d6
SHA256:Y4DicuOpInMIFShZXHfSNoLX++Wx6M4jRXOX2H+8Ows
ED25519
MD5:3e:d1:e9:4e:9b:67:13:bd:09:ca:29:3a:69:82:18:16
SHA256:h24nEbxUtKIbXdtWgZLjo7YgRYmaSPOfPmLAHgnaF7E
RSA
MD5:f7:db:13:73:15:d1:a8:5d:a3:7c:a2:56:b3:e9:35:5e
SHA256:qybRVV995yOb3OlI+b5aUeuXc3AicftyTPQ0Tz4miTk
QOS#
The standard QOS are:
normal
for jobs using up to 8 nodes, with a time limit of 3 days. This is the default;long
for jobs using up to 8 nodes, with a time limit of 7 days;build
for compiling your codes, with up to 16 cores on 1 node, 0 GPU and a time limit of 4 hours;debug
for debugging jobs on up to 2 nodes, with a high priority and a time limit of 1 hour.
Choose one with -q <qos>
or --qos <qos>
.
Partitions#
There are 2 partitions on Kuma, to differentiate nodes based on GPU type:
h100
, to use the Nvidia H100 GPU nodes (with FP64 capabilities);l40s
, to use the Nvidia L40s GPU nodes (with FP32 capabilities);
There is no default partition. You have to choose one.
Choose one with -p <partition>
or --partition <partition>
.
Hardware characteristics#
This cluster has the following configuration:
Type | Count | Model | CPU | Memory | Storage | Naming | GPU # | GPU Model |
---|---|---|---|---|---|---|---|---|
Frontend | 2 | ThinkSystem SR675 V3 Version: 03 | AMD EPYC 9334 @ 2.7 GHz | 384 GB | 6.4 TB (NVMe) | kuma[1-2] | NA | NA |
Compute node H100 | 84 | ThinkSystem SR675 V3 Version: 03 | AMD EPYC 9334 @ 2.7 GHz | 384 GB | 6.4 TB (NVMe) | kh[001-084] | 4 | NVIDIA H100 94GB |
Compute node L40s | 20 | ThinkSystem SR675 V3 Version: 03 | AMD EPYC 9334 @ 2.7 GHz | 384 GB | 7.6 TB x3 (NVMe) | kl[001-020] | 8 | NVIDIA L40S 48GB |
Admin server | 2 | ThinkSystem SR630 V3 Version: 06 | Intel(R) Xeon(R) Silver 4416+ CPU @ 2.00 GHz | 256 GB | 1920 GB (SCSI) | kadmin[1-2] | NA | NA |
Proxy server | 1 | ThinkSystem SR630 V3 Version: 06 | Intel(R) Xeon(R) Silver 4416+ CPU @ 2.00 GHz | 256 GB | 1920 GB (SCSI) | ksmartproxy1 | NA | NA |