Transferring data#
Before you begin#
We will discuss 2 cases of data transfer here:
-
Internal Transfers: If you need to transfer data between two computers on the EPFL network (e.g., between your laptop connected to EPFL WiFi or via VPN and one of the clusters), you can follow the simple procedure detailed in the Internal Data Transfer section.
-
External Transfers: If the computer you are transferring data to/from is not on the EPFL network (e.g., a cluster from another university), you must use the
/transfer
file system and a special data-transfer node,scitas-transfer.epfl.ch
, which is the only node exposed to the outside of the EPFL network. Detailed instructions can be found here.
Internal Data Transfer#
To transfer data from or to a SCITAS shared folder on the clusters, you may use either rsync
or
scp
. While they both allow for a secure transfer of your data over the SSH
protocol, we encourage you to use rsync
.
Note
In both cases, the data retrieving command must be executed from your computer.
MS Windows and Apple macOS users wishing to use a GUI application, may have a look at:
Softwares#
Using rsync
#
Remote synchronization (or rsync
) is a utility for synchronizing files between
a source and a destination. Data can be compressed before synchronization and
rsync
works using an incremental protocol. This means that:
- It will only synchronize data that has changed, thus reducing the network usage and transfer time
- It can resume the synchronization from where it was in case of an interruption.
The basic usage for rsync
is the following
The typical options you can use are the following:
-a
: archive mode. Among other things, it copies recursively, keeps symlinks, preserves file permissions and modification times.-z
: use data compression.-P
: show progress
To transfer data from your computer to a SCITAS machine, use the following command:
$ where<GASPAR>
is your GASPAR username and <machine>
is the machine name.
To do the inverse (from a SCITAS machine to your computer), do the following:
Trailing /
rsync
makes a difference if you put a trailing /
at the end of the
source path. For example, if you want to transfer a folder data
located at
/home/user/data
, the following two commands will act differently:
rsync -azP /home/user/data <dest>
: will transfer the folderdata
into<dest>
, meaning that you will end up with<dest>/data
rsync -azP /home/user/data/ <dest>
: will transfer the content ofdata
into<dest>
, meaning that you will end up with<dest>/<data_content>
rsync
for local copy
rsync
is not limited to synchronizing files over a network. You can
also use it to synchronize files locally. This and its options make it very
convenient as a backup tool.
Using scp
#
Secure copy (or scp
) is a tool used to copy files between hosts over a
network. It acts very similarly as the usual cp
command. Contrarily to
rsync
, scp
will always transfer the whole data and cannot resume in case of
an interuption.
The syntax is the following:
as forcp
, the main option you may use is -r
for a recursive copy. Other
than that, the usage is similar as rsync
presented adove.
Problem that may occur with scp
Should you have to copy a directory from your local computer to the remote server, you may encounter the following error:
scp: realpath ./[Your_local_dir_name]: No such file
scp: upload "./[Your_local_dir_name]": path canonicalization failed
scp: failed to upload directory [Your_local_dir_name] to .
Try with the -O option:
External Data Transfer#
The /transfer
file system is a special file system optimized for data transfers.
For security reasons it has a data retention policy of only 15 days (files older
than 15 days will be automatically removed without notice). Furthermore, the
/transfer
file system is the only one that is mounted on all the SCITAS
clusters and the scitas-transfer.epfl.ch
transfer node.
rsync
or scp
#
The procedure to transfer data externally to EPFL is the following:
- From one of the clusters, place the data you want to send in the
/transfer/<username>
folder, where<username>
is the GASPAR ID, to make it available on the transfer node. - Log into
scitas-transfer.epfl.ch
and transfer your data in/transfer/<username>
using your favorite protocol, e.g.scp
orrsync
.
Access from outside EPFL network
If you want to access scitas-transfer.epfl.ch from outside EPFL network or VPN, you need to add first your public SSH key to scitas-transfer.epfl.ch from inside EPFL network or VPN If you don't already have generated a pair of SSH keys, please follow this procedure. At the end of the procedure, you can copy your SSH public key to scitas-transfer using this command from inside EPFL network or VPN
whereYou can also copy data to the SCITAS clusters using the same procedure, but in
reverse order. First transfer the data to scitas-transfer.epfl.ch
and then copy it back
where you would like to store them from one of the clusters.
Globus Connect Personal#
Globus Connect Personal is a tool that allows you to easily and securely transfer and share large amounts of data from your personal computer or server.
It turns your device into a Globus endpoint, enabling seamless interaction with other endpoints for efficient data management.
Whether you're moving data between your computer and cloud storage or sharing files with colleagues, Globus Connect Personal ensures reliable and secure transfers.
Where to run Globus Personel Connect client
Globus Personnel connect is meant to be run on scitas-transfer.epfl.ch
server.
It is not allowed to run this tool on the front-end or the compute nodes.
Setup#
Follow the official Linux documentation to download and setup the client.
Account
You can use your "Switch edu-ID" account to log into Globus Portal.
Enabling the access to /transfer
By default the Globus Connect Personal client can only access to the /home
folder.
To give access to /transfer/myfolder
use the command:
echo "/transfer/myfolder,0,1" >> ~/.globusonline/lta/config-paths