S3 Buckets#
Before you start#
S3 is a simple object storage service initially provided by Amazon. There are numerous datasets that you can access through S3 as well as store your own objects. EPFL is providing an official S3 endpoint as well as other faculties at EPFL.
Buckets @EPFL#
At EPFL, you can choose between two S3 ressources:
Accessing S3 Buckets in the SCITAS Buckets#
You can use one of this tools to access a S3 Bucket:
s3fs#
s3fs allows you to use an S3 bucket as a filesystem. It is installed on the cluster login nodes (must be mounted before running the jobs on the login node).
Configure access to your existing S3 bucket#
Mount the bucket as a filesystem in your home directory#
For EPFL S3 service, if you use an other endpoint you have to modify the url (or remove it from Amazon S3)
mkdir ${HOME}/mybucket
s3fs BUCKET_ID ${HOME}/mybucket -o url=https://s3.epfl.ch/ -o passwd_file=${HOME}/.passwd-s3fs
Umount the filesystem
boto (python)#
Install the library
Using the library (source: https://icitdocs.epfl.ch/display/clusterdocs/Accessing+Datasets)
#!/usr/bin/env python3
import boto
import boto.s3.connection
endpoint = 's3.epfl.ch'
access_key = 'put your access key here!'
secret_key = 'put your secret key here!'
bucket_id = 'put your bucket id here!'
conn = boto.connect_s3(
aws_access_key_id = access_key,
aws_secret_access_key = secret_key,
host = endpoint,
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
bucket = conn.get_bucket(bucket_id)
# listing objects in bucket
for key in bucket.list():
print ("{name}\t{size}\t{modified}".format(
name = key.name,
size = key.size,
modified = key.last_modified,
))
More information in the boto documentation.
rclone#
More information in the rclone documentation.
To configure, edit ~/.rclone.conf
[private]
type = s3
access_key_id = put_your_access_key_here
secret_access_key = put_your_secret_key_here
region = other-v2-signature
endpoint = https://s3.epfl.ch/
List a bucket content
Copy a file from your bucket Ex:
Copy multiple files