Transferring data#
Before you begin#
We will discuss 2 cases of data transfer here:
-
Internal Transfers: If you need to transfer data between two computers on the EPFL network (e.g., between your laptop connected to EPFL WiFi or via VPN and one of the clusters), you can follow the simple procedure detailed in the Internal Data Transfer section.
-
External Transfers: If the computer you are transferring data to/from is not on the EPFL network (e.g., a cluster from another university), you must use the
/transfer
file system and a special data-transfer node,fdata1.epfl.ch
, which is the only node exposed to the outside of the EPFL network. Detailed instructions can be found here.
Internal Data Transfer#
To transfer data from or to a SCITAS shared folder on the clusters, you may use either rsync
or
scp
. While they both allow for a secure transfer of your data over the SSH
protocol, we encourage you to use rsync
.
Note
In both cases, the data retrieving command must be executed from your computer.
MS Windows and Apple macOS users wishing to use a GUI application, may have a look at:
Softwares#
Using rsync
#
Remote synchronization (or rsync
) is a utility for synchronizing files between
a source and a destination. Data can be compressed before synchronization and
rsync
works using an incremental protocol. This means that:
- It will only synchronize data that has changed, thus reducing the network usage and transfer time
- It can resume the synchronization from where it was in case of an interruption.
The basic usage for rsync
is the following
The typical options you can use are the following:
-a
: archive mode. Among other things, it copies recursively, keeps symlinks, preserves file permissions and modification times.-z
: use data compression.-P
: show progress
To transfer data from your computer to a SCITAS machine, use the following command:
$ where<GASPAR>
is your GASPAR username and <machine>
is the machine name.
To do the inverse (from a SCITAS machine to your computer), do the following:
Trailing /
rsync
makes a difference if you put a trailing /
at the end of the
source path. For example, if you want to transfer a folder data
located at
/home/user/data
, the following two commands will act differently:
rsync -azP /home/user/data <dest>
: will transfer the folderdata
into<dest>
, meaning that you will end up with<dest>/data
rsync -azP /home/user/data/ <dest>
: will transfer the content ofdata
into<dest>
, meaning that you will end up with<dest>/<data_content>
rsync
for local copy
rsync
is not limited to synchronizing files over a network. You can
also use it to synchronize files locally. This and its options make it very
convenient as a backup tool.
Using scp
#
Secure copy (or scp
) is a tool used to copy files between hosts over a
network. It acts very similarly as the usual cp
command. Contrarily to
rsync
, scp
will always transfer the whole data and cannot resume in case of
an interuption.
The syntax is the following:
as forcp
, the main option you may use is -r
for a recursive copy. Other
than that, the usage is similar as rsync
presented adove.
Problem that may occur with scp
Should you have to copy a directory from your local computer to the remote server, you may encounter the following error:
scp: realpath ./[Your_local_dir_name]: No such file
scp: upload "./[Your_local_dir_name]": path canonicalization failed
scp: failed to upload directory [Your_local_dir_name] to .
Try with the -O option:
External Data Transfer#
The /transfer
file system is a special file system optimized for data transfers.
For security reasons it has a data retention policy of only 15 days (files older
than 15 days will be automatically removed without notice). Furthermore, the
/transfer
file system is the only one that is mounted on all the SCITAS
clusters and the fdata1.epfl.ch
transfer node.
The procedure to transfer data externally to EPFL is the following:
- From one of the clusters, place the data you want to send in the
/transfer/<username>
folder, where<username>
is the GASPAR ID, to make it available on the transfer node. - Log into
fdata1.epfl.ch
and transfer your data in/transfer/<username>
using your favorite protocol, e.g.scp
orrsync
.
Access from outside EPFL network
If you want to access fdata1.epfl.ch from outside EPFL network or VPN, you need to add first your public SSH key to fdata1.epfl.ch from inside EPFL network or VPN If you don't already have generated a pair of SSH keys, please follow this procedure. At the end of the procedure, you can copy your SSH public key to fdata1 using this command from inside EPFL network or VPN
whereYou can also copy data to the SCITAS clusters using the same procedure, but in
reverse order. First transfer the data to fdata1.epfl.ch
and then copy it back
where you would like to store them from one of the clusters.