Archiving#
SCITAS provides a long-term magnetic tape archive system accessible from all the frontend nodes of the clusters.
Archiving space#
Each lab fileset present in /work has a similar fileset automatically created
in /archive and located in /archive/<account_name>. It is a pay-per-use
system, so you only pay for space you occupy.
The path /archive corresponds to a disk-based file system independent of
others (/home, /work, /scratch and /export), which acts as a buffer for
data until data is copied on tapes.
You can simply transfer your data to be archived via a file copy onto the
/archive filesystem and a process will trigger daily to transfer data from the
/archive file system to the magnetic tapes. Before proceeding, please make
sure to read the Prepare your data for
archiving section.
In order to encourage the use of archive files and limit the impact of mistakes like copying a lot of small files without preparing them properly into a single archive file, we limited your fileset to 2000 inodes maximum, meaning you can't have more than 2000 items (ie files and directories) in your lab folder.
Archiving process#
The data archiving is a multi-stage process that we can decompose as follows:
- You copy your data that you want to archive on
/archive. It is initially in a "resident" state, meaning that the data and metadata (inodes) of the files are still on the disks of the archive filesystem, not on the tapes. - A copying process runs daily to transfer the file data to the tapes,
but leaves the metadata on the
/archivefilesystem. This corresponds to the "migrated" state. Since the metadata remains on the filesystem, you can easily manipulate them, e.g. to list the files usingls -l, without accessing the tapes. However, trying to open such a file, and thus access the actual data, will be very time-consuming because a robot has to physically load a tape into a reader to access it. It is therefore strongly discouraged. - Finally, when you recall a file from the tapes (e.g., by requesting a copy
from
/archiveto your personal workspace), the file data will be copied back to the archive filesystem. The data will then be present both on/archiveand on the tapes: this is called the "pre-migrated" state.
As mentioned before, a script runs daily on the /archive/<account_name>
directories to move data from files onto the tapes. Once data is moved to the
tapes, access to archived files will be very long. For this reason, it is
strongly advised not to open files on the archive system unless absolutely
necessary. Only metadata will stay on /archive, so only commands using
metadata such as ls or stat will be fast in their execution.
Prepare your data for archiving#
Warning
Never use the archive to read, write or execute files. This is not the purpose of this storage space. It would cause congestion in the tape robotics.
Pack your data into a single archive#
If you wish to archive multiple files at once, place them in a folder and create a tarball. It is much easier and more efficient for the archive system to handle one large file rather than many small ones. That is the reason why we limited the number of inodes on each fileset to 2000.
Danger
If you copy many files without preparing them first, you may quickly reach your 2000 inode limit and lose the ability to add any more files to /archive/<lab>. Simply delete the files that were accidentally copied to free up inodes.
You can create it like this:
Then, create a list of the files present in the archive you just made:
Warning
It is crucial to keep you tarball lists on your own safe storage
outside of /archive where you can retrieve and parse it easily and
safely.
Indeed, while it is technically possible to consult the file list directly on
/archive, this will access the data rather than the metadata of the tarball.
Doing so will trigger the mounting of a tape in a reader, and you will get your
file list after several minutes, while blocking a drive for this task.
Transferring the tarball to /archive#
Just copy your tarball to the archive filesystem:
It will be copied on tapes during the next archiving process each night.
Best practices on the /archive system#
As mentioned previously, retrieving data on the /archive filesystem is a very
long process because the machine needs to actually move tapes around. On the
contrary, the metadata will still be easily accessible on /archive because
they are not put on tape. It is then very important to limit the access to the
archived data as much as possible.
To help you, here are some examples of UNIX commands that only affect the metadata and some that will access the actual data.
List of commands that only access the metadata#
List of commands that access the data#
cat
grep
head, tail
less, more
touch
file parsing: sed, cut, sort...
tar (once inside the /archive)
editors: vi, vim, nano, emacs etc...
Deleting archives#
A simple rm on files you want to delete in /archive is enough.
In the event of error, you will have 10 days to contact us until files are permanently erased from tapes. We will do our maximum to locate and recover your files. To contact us, please send an email to 1234@epfl.ch with HPC in subject.
