Skip to content

Archiving#

SCITAS provides a long-term magnetic tape archive system accessible from all the frontend nodes of the clusters.

Archiving space#

Each lab fileset present in /work has a similar fileset automatically created in /archive and located in /archive/<account_name>. It is a pay-per-use system, so you only pay for space you occupy.

The path /archive corresponds to a disk-based file system independent of others (/home, /work, /scratch and /export), which acts as a buffer for data until data is copied on tapes.

You can simply transfer your data to be archived via a file copy onto the /archive filesystem and a process will trigger daily to transfer data from the /archive file system to the magnetic tapes. Before proceeding, please make sure to read the Prepare your data for archiving section.

Archiving process#

The data archiving is a multi-stage process that we can decompose as follows:

  • You copy your data that you want to archive on /archive. It is initially in a "resident" state, meaning that the data and metadata (inodes) of the files are still on the disks of the archive filesystem, not on the tapes.
  • A copying process runs daily to transfer the file data to the tapes, but leaves the metadata on the /archive filesystem. This corresponds to the "migrated" state. Since the metadata remains on the filesystem, you can easily manipulate them, e.g. to list the files using ls -l, without accessing the tapes. However, trying to open such a file, and thus access the actual data, will be very time-consuming because a robot has to physically load a tape into a reader to access it. It is therefore strongly discouraged.
  • Finally, when you recall a file from the tapes (e.g., by requesting a copy from /archive to your personal workspace), the file data will be copied back to the archive filesystem. The data will then be present both on /archive and on the tapes: this is called the "pre-migrated" state.

As mentioned before, a script runs daily on the /archive/<account_name> directories to move data from files onto the tapes. Once data is moved to the tapes, access to archived files will be very long. For this reason, it is strongly advised not to open files on the archive system unless absolutely necessary. Only metadata will stay on /archive, so only commands using metadata such as ls or stat will be fast in their execution.

Prepare your data for archiving#

Warning

Never use the archive to read, write or execute files. This is not the purpose of this storage space. It would cause congestion in the tape robotics.

Pack your data into a single archive#

If you wish to archive multiple files at once, place them in a folder and create a tarball. It is much easier and more efficient for the archive system to handle one large file rather than many small ones. You can create it like this:

tar -cvf my_tarball.tar my_folder

Then, create a list of the files present in the archive you just made:

tar -tvf my_tarball.tar > my_tarball.list

Warning

It is crucial to keep you tarball lists on your own safe storage outside of /archive where you can retrieve and parse it easily and safely.

Indeed, while it is technically possible to consult the file list directly on /archive, this will access the data rather than the metadata of the tarball. Doing so will trigger the mounting of a tape in a reader, and you will get your file list after several minutes, while blocking a drive for this task.

Transferring the tarball to /archive#

Just copy your tarball to the archive filesystem:

mv <my_tarball>.tar /archive/<lab_folder>/
It will be copied on tapes during the next archiving process each night.

Best practices on the /archive system#

As mentioned previously, retrieving data on the /archive filesystem is a very long process because the machine needs to actually move tapes around. On the contrary, the metadata will still be easily accessible on /archive because they are not put on tape. It is then very important to limit the access to the archived data as much as possible. To help you, here are lists of UNIX commands that uses only the metadata and one that touches the data (non-comprehensive).

List of commands acessing the metadata#

ls
stat
find
getfacl, setfacl
chmod, chown
rmdir, rm
mv (within /archive)

List commands accessing the data#

cat
grep
head, tail
less, more
touch
file parsing: sed, cut, sort...
tar (once inside the /archive)
editors: vi, vim, nano, emacs etc...

Deleting archives#

A simple rm on files you want to delete in /archive is enough.

In the event of error, you will have 10 days to contact us until files are permanently erased from tapes. We will do our maximum to locate and recover your files. To contact us, please send an email to 1234@epfl.ch with HPC in subject.