Archiving#
SCITAS provides a long-term magnetic tape archive system accessible from all the frontend nodes of the clusters.
Archiving space#
Each lab fileset present in /work
has a similar fileset automatically created
in /archive
and located in /archive/<account_name>
. It is a pay-per-use
system, so you only pay for space you occupy.
The path /archive
corresponds to a disk-based file system independent of
others (/home
, /work
, /scratch
and /export
), which acts as a buffer for
data until data is copied on tapes.
You can simply transfer your data to be archived via a file copy onto the
/archive
filesystem and a process will trigger daily to transfer data from the
/archive
file system to the magnetic tapes. Before proceeding, please make
sure to read the Prepare your data for
archiving section.
Archiving process#
The data archiving is a multi-stage process that we can decompose as follows:
- You copy your data that you want to archive on
/archive
. It is initially in a "resident" state, meaning that the data and metadata (inodes) of the files are still on the disks of the archive filesystem, not on the tapes. - A copying process runs daily to transfer the file data to the tapes,
but leaves the metadata on the
/archive
filesystem. This corresponds to the "migrated" state. Since the metadata remains on the filesystem, you can easily manipulate them, e.g. to list the files usingls -l
, without accessing the tapes. However, trying to open such a file, and thus access the actual data, will be very time-consuming because a robot has to physically load a tape into a reader to access it. It is therefore strongly discouraged. - Finally, when you recall a file from the tapes (e.g., by requesting a copy
from
/archive
to your personal workspace), the file data will be copied back to the archive filesystem. The data will then be present both on/archive
and on the tapes: this is called the "pre-migrated" state.
As mentioned before, a script runs daily on the /archive/<account_name>
directories to move data from files onto the tapes. Once data is moved to the
tapes, access to archived files will be very long. For this reason, it is
strongly advised not to open files on the archive system unless absolutely
necessary. Only metadata will stay on /archive
, so only commands using
metadata such as ls
or stat
will be fast in their execution.
Prepare your data for archiving#
Warning
Never use the archive to read, write or execute files. This is not the purpose of this storage space. It would cause congestion in the tape robotics.
Pack your data into a single archive#
If you wish to archive multiple files at once, place them in a folder and create a tarball. It is much easier and more efficient for the archive system to handle one large file rather than many small ones. You can create it like this:
Then, create a list of the files present in the archive you just made:
Warning
It is crucial to keep you tarball lists on your own safe storage
outside of /archive
where you can retrieve and parse it easily and
safely.
Indeed, while it is technically possible to consult the file list directly on
/archive
, this will access the data rather than the metadata of the tarball.
Doing so will trigger the mounting of a tape in a reader, and you will get your
file list after several minutes, while blocking a drive for this task.
Transferring the tarball to /archive
#
Just copy your tarball to the archive filesystem:
It will be copied on tapes during the next archiving process each night.Best practices on the /archive
system#
As mentioned previously, retrieving data on the /archive
filesystem is a very
long process because the machine needs to actually move tapes around. On the
contrary, the metadata will still be easily accessible on /archive
because
they are not put on tape. It is then very important to limit the access to the
archived data as much as possible. To help you, here are lists of UNIX commands
that uses only the metadata and one that touches the data (non-comprehensive).
List of commands acessing the metadata#
List commands accessing the data#
cat
grep
head, tail
less, more
touch
file parsing: sed, cut, sort...
tar (once inside the /archive)
editors: vi, vim, nano, emacs etc...
Deleting archives#
A simple rm
on files you want to delete in /archive
is enough.
In the event of error, you will have 10 days to contact us until files are permanently erased from tapes. We will do our maximum to locate and recover your files. To contact us, please send an email to 1234@epfl.ch with HPC in subject.