Skip to content

Transferring data#

Before you begin#

We will discuss 2 cases of data transfer here:

  1. Internal Transfers: If you need to transfer data between two computers on the EPFL network (e.g., between your laptop connected to EPFL WiFi or via VPN and one of the clusters), you can follow the simple procedure detailed in the Internal Data Transfer section.

  2. External Transfers: If the computer you are transferring data to/from is not on the EPFL network (e.g., a cluster from another university), you must use the /transfer file system and a special data-transfer node, scitas-transfer.epfl.ch. You can use for example Globus Connect Personal to transfer data. Detailed instructions can be found here.

Internal Data Transfer#

To transfer data from or to a SCITAS shared folder on the clusters, you may use either rsync or scp. While they both allow for a secure transfer of your data over the SSH protocol, we encourage you to use rsync.

Note

In both cases, the data retrieving command must be executed from your computer.

MS Windows and Apple macOS users wishing to use a GUI application, may have a look at:

Softwares#

Using rsync#

Remote synchronization (or rsync) is a utility for synchronizing files between a source and a destination. Data can be compressed before synchronization and rsync works using an incremental protocol. This means that:

  • It will only synchronize data that has changed, thus reducing the network usage and transfer time
  • It can resume the synchronization from where it was in case of an interruption.

The basic usage for rsync is the following

$ rsync [options] <src> <dest>

The typical options you can use are the following:

  • -a: archive mode. Among other things, it copies recursively, keeps symlinks, preserves file permissions and modification times.
  • -z: use data compression.
  • -P: show progress

To transfer data from your computer to a SCITAS machine, use the following command:

$ rsync -azP /path/to/src <GASPAR>@<machine>.hpc.epfl.ch:/path/to/dest
$ where <GASPAR> is your GASPAR username and <machine> is the machine name.

To do the inverse (from a SCITAS machine to your computer), do the following:

$ rsync -azP <GASPAR>@<machine>.hpc.epfl.ch:/path/to/src /path/to/dest

Trailing /

rsync makes a difference if you put a trailing / at the end of the source path. For example, if you want to transfer a folder data located at /home/user/data, the following two commands will act differently:

  • rsync -azP /home/user/data <dest>: will transfer the folder data into <dest>, meaning that you will end up with <dest>/data
  • rsync -azP /home/user/data/ <dest>: will transfer the content of data into <dest>, meaning that you will end up with <dest>/<data_content>

rsync for local copy

rsync is not limited to synchronizing files over a network. You can also use it to synchronize files locally. This and its options make it very convenient as a backup tool.

Using scp#

Secure copy (or scp) is a tool used to copy files between hosts over a network. It acts very similarly as the usual cp command. Contrarily to rsync, scp will always transfer the whole data and cannot resume in case of an interuption.

The syntax is the following:

$ scp [options] <src> <dest>
as for cp, the main option you may use is -r for a recursive copy. Other than that, the usage is similar as rsync presented adove.

Problem that may occur with scp

Should you have to copy a directory from your local computer to the remote server, you may encounter the following error:
scp: realpath ./[Your_local_dir_name]: No such file
scp: upload "./[Your_local_dir_name]": path canonicalization failed
scp: failed to upload directory [Your_local_dir_name] to .

Try with the -O option:

$ scp -r -O My_dir username@jed.hpc.epfl.ch:

External Data Transfer#

The /transfer file system is a special file system optimized for data transfers. For security reasons it has a data retention policy of only 30 days (files older than 30 days will be automatically removed without notice). Furthermore, the /transfer file system is the only one that is mounted on all the SCITAS clusters and the scitas-transfer.epfl.ch transfer node.

Globus Connect Personal#

Globus Connect Personal is a tool that allows you to easily and securely transfer and share large amounts of data from your personal computer or server.
It turns your device into a Globus endpoint, enabling seamless interaction with other endpoints for efficient data management.
Whether you're moving data between your computer and cloud storage or sharing files with colleagues, Globus Connect Personal ensures reliable and secure transfers.

Where to run Globus Personel Connect client

Globus Personnel connect is meant to be run on scitas-transfer.epfl.ch server.
It is not allowed to run this tool on the front-end or the compute nodes.

Setup#

Follow the official Linux documentation to download and setup the client.

Account

You can use your "Switch edu-ID" account to log into Globus Portal.

Enabling the access to /transfer

By default the Globus Connect Personal client can only access to the /home folder.
To give access to /transfer/myfolder use the command:
echo "/transfer/myfolder,0,1" >> ~/.globusonline/lta/config-paths