Skip to content

Transferring data#

Before you begin#

We will discuss 2 cases of data transfer here:

  1. Internal Transfers: If you need to transfer data between two computers on the EPFL network (e.g., between your laptop connected to EPFL WiFi or via VPN and one of the clusters), you can follow the simple procedure detailed in the Internal Data Transfer section.

  2. External Transfers: If the computer you are transferring data to/from is not on the EPFL network (e.g., a cluster from another university), you must use the /transfer file system and a special data-transfer node, fdata1.epfl.ch, which is the only node exposed to the outside of the EPFL network. Detailed instructions can be found here.

Internal Data Transfer#

To transfer data from or to a SCITAS shared folder on the clusters, you may use either rsync or scp. While they both allow for a secure transfer of your data over the SSH protocol, we encourage you to use rsync.

Note

In both cases, the data retrieving command must be executed from your computer.

MS Windows and Apple macOS users wishing to use a GUI application, may have a look at:

Softwares#

Using rsync#

Remote synchronization (or rsync) is a utility for synchronizing files between a source and a destination. Data can be compressed before synchronization and rsync works using an incremental protocol. This means that:

  • It will only synchronize data that has changed, thus reducing the network usage and transfer time
  • It can resume the synchronization from where it was in case of an interruption.

The basic usage for rsync is the following

$ rsync [options] <src> <dest>

The typical options you can use are the following:

  • -a: archive mode. Among other things, it copies recursively, keeps symlinks, preserves file permissions and modification times.
  • -z: use data compression.
  • -P: show progress

To transfer data from your computer to a SCITAS machine, use the following command:

$ rsync -azP /path/to/src <GASPAR>@<machine>.hpc.epfl.ch:/path/to/dest
$ where <GASPAR> is your GASPAR username and <machine> is the machine name.

To do the inverse (from a SCITAS machine to your computer), do the following:

$ rsync -azP <GASPAR>@<machine>.hpc.epfl.ch:/path/to/src /path/to/dest

Trailing /

rsync makes a difference if you put a trailing / at the end of the source path. For example, if you want to transfer a folder data located at /home/user/data, the following two commands will act differently:

  • rsync -azP /home/user/data <dest>: will transfer the folder data into <dest>, meaning that you will end up with <dest>/data
  • rsync -azP /home/user/data/ <dest>: will transfer the content of data into <dest>, meaning that you will end up with <dest>/<data_content>

rsync for local copy

rsync is not limited to synchronizing files over a network. You can also use it to synchronize files locally. This and its options make it very convenient as a backup tool.

Using scp#

Secure copy (or scp) is a tool used to copy files between hosts over a network. It acts very similarly as the usual cp command. Contrarily to rsync, scp will always transfer the whole data and cannot resume in case of an interuption.

The syntax is the following:

$ scp [options] <src> <dest>
as for cp, the main option you may use is -r for a recursive copy. Other than that, the usage is similar as rsync presented adove.

Problem that may occur with scp

Should you have to copy a directory from your local computer to the remote server, you may encounter the following error:
scp: realpath ./[Your_local_dir_name]: No such file
scp: upload "./[Your_local_dir_name]": path canonicalization failed
scp: failed to upload directory [Your_local_dir_name] to .

Try with the -O option:

$ scp -r -O My_dir username@jed.hpc.epfl.ch:

External Data Transfer#

The /transfer file system is a special file system optimized for data transfers. For security reasons it has a data retention policy of only 15 days (files older than 15 days will be automatically removed without notice). Furthermore, the /transfer file system is the only one that is mounted on all the SCITAS clusters and the fdata1.epfl.ch transfer node.

The procedure to transfer data externally to EPFL is the following:

  1. From one of the clusters, place the data you want to send in the /transfer/<username> folder, where <username> is the GASPAR ID, to make it available on the transfer node.
  2. Log into fdata1.epfl.ch and transfer your data in /transfer/<username> using your favorite protocol, e.g. scp or rsync.

Access from outside EPFL network

If you want to access fdata1.epfl.ch from outside EPFL network or VPN, you need to add first your public SSH key to fdata1.epfl.ch from inside EPFL network or VPN If you don't already have generated a pair of SSH keys, please follow this procedure. At the end of the procedure, you can copy your SSH public key to fdata1 using this command from inside EPFL network or VPN

$ ssh-copy-id -i ${HOME}/.ssh/<YOURKEYNAME>.pub <USERNAME>@fdata1.epfl.ch
where is your GASPAR ID. After that, you will be able to use fdata1.epfl.ch from outside EPFL network

You can also copy data to the SCITAS clusters using the same procedure, but in reverse order. First transfer the data to fdata1.epfl.ch and then copy it back where you would like to store them from one of the clusters.