How to use TensorFlow and TensorBoard on Izar#

This is a short guide to describe how to use TensorBoard on Izar through SSH port forwarding.

Run TensorBoard on the Izar cluster#

First, connect to the Izar cluster and load the necessary modules:

$ module load gcc python openmpi py-tensorflow

The script below is a template that allows you to start TensorBoard and launch a TensorFlow python script on a compute node of Izar. You can copy it in a file called launch_tensorboard.sh for example. It has to be placed in your /home (or you have to modify it accordingly). Note that all the modules that TensorFlow may need have to be loaded here. As an example, here, we loaded the modules that allow us to use TensorFlow in the TensorBoard:

#!/bin/bash -l
#SBATCH --job-name=tensorbord-trial
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --time=00:10:00
#SBATCH --output tensorboard-log-%J.out

module load gcc python openmpi py-tensorflow

ipnport=$(shuf -i8000-9999 -n1)
tensorboard --logdir logs --port=${ipnport} --bind_all

python example.py

For testing, an example of a Python script, example.py, is given here:

import tensorflow as tf
import numpy as np

# Model parameters
W = tf.Variable([0.3], dtype=tf.float32, name="W")
b = tf.Variable([-0.3], dtype=tf.float32, name="b")

# Training data
x_train = np.array([1, 2, 3, 4], dtype=np.float32)
y_train = np.array([0, -1, -2, -3], dtype=np.float32)

# Create a summary writer for TensorBoard
log_dir = 'logs'
writer = tf.summary.create_file_writer(log_dir)

# Training loop
for i in range(1000):
    with tf.GradientTape() as tape:
        linear_model = W * x_train + b
        loss = tf.reduce_sum(tf.square(linear_model - y_train))

    # Compute gradients
    gradients = tape.gradient(loss, [W, b])

    # Update weights
    W.assign_sub(0.01 * gradients[0])
    b.assign_sub(0.01 * gradients[1])

    # Log the loss to TensorBoard
    with writer.as_default():
        tf.summary.scalar('loss', loss, step=i)

# Close the writer
writer.close()

# Evaluate training accuracy
print("W: %s b: %s loss: %s" % (W.numpy(), b.numpy(), loss.numpy()))

Launch your job as usual:

$ sbatch launch_tensorboard.sh

Once the job is running analyze the output tensorboard-log-[SLURM_ID].out. Then, look for a line like the following:

Or copy and paste one of these URLs:
    http://10.91.27.63:8504/

It has the form:

http://<IP ADDRESS>:<PORT NUMBER>

Use TensorBoard on a local machine#

On your local machine execute the following command with the information provided by the above step:

ssh -L <PORT NUMBER>:<IP ADDRESS>:<PORT NUMBER> -l <USERNAME> izar.epfl.ch -f -N

In our example, this gives:

ssh -L 8504:10.91.27.63:8504 -l user izar.epfl.ch -f -N

Now you should be able to access to the Izar compute node through the web browser by pasting the following address:

http://localhost:<PORT NUMBER>/

For our example, this gives:

http://localhost:8504/

Last update: February 29, 2024