Effective data management is critical when working on the Euler Cluster, particularly for machine learning workflows that involve large datasets and model outputs. This section explains the available storage options and their proper usage.
/cluster/home/$USER
)/cluster/scratch/$USER
or $SCRATCH
)/cluster/project/rsl/$USER
)/cluster/work/rsl/$USER
)In exceptional cases we can approve more storage space. For this, ask your supervisor to contact
patelm@ethz.ch
.
$TMPDIR
)lquota
in the terminal to check your used storage space for Home
and Scratch
directories.Project
and Work
directories you can run:
(head -n 5 && grep -w $USER) < /cluster/work/rsl/.rsl_user_data_usage.txt
(head -n 5 && grep -w $USER) < /cluster/project/rsl/.rsl_user_data_usage.txt
Note: This wont show the per-user quota limit which is enforced by RSL ! Refer to the table below for the quota limits.
Project
and Work
Directories and why is it necessary to make use of both?Basically, both Project
and Work
are persistent storages (meaning the data is not deleted automatically); however, the use cases are different. When you have lots of small files, for example, conda environments, you should store them in the Project
directory as it has a higher capacity for # of inodes. On the other hand, when you have larger files such as model checkpoints, singularity containers and results you should store them in the Work
directory as the storage capacity is higher.
$TMPDIR
) ?Whenever you run a compute job, you can also ask for a certain amount of local scratch space ($TMPDIR
) which allocates space on a local hard drive. The main advantage of the local scratch is, that it is located directly inside the compute nodes and not attached via the network. Thus it is highly recommended to copy over your singularity container / datasets to $TMPDIR
and then use that for the trainings. Detailed workflows for the trainings are provided later in this guide.
Storage Location | Max Inodes | Max Size per User | Purged | Recommended Use Case |
---|---|---|---|---|
/cluster/home/$USER |
~450,000 | 45 GB | No | Code, config, small files |
/cluster/scratch/$USER |
1 M | 2.5 TB | Yes (older than 15 days) | Datasets, training data, temporary usage |
/cluster/project/rsl/$USER |
2.5 M | 75 GB | No | Conda envs, software packages |
/cluster/work/rsl/$USER |
50,000 | 200 GB | No | Large result files, model checkpoints, Singularity containers, |
$TMPDIR |
very high | Upto 800 GB | Yes (at end of job) | Training Datasets, Singularity Images |
To verify your storage setup and check quotas:
Submit as a job to test $TMPDIR
:
sbatch test_storage_quotas.sh
lquota
$TMPDIR
for active jobs - Copy data to local scratch for faster I/O during computation