Effective data management is critical when working on the Euler Cluster, particularly for machine learning workflows that involve large datasets and model outputs. This section explains the available storage options and their proper usage.
/cluster/home/$USER)/cluster/scratch/$USER or $SCRATCH)/cluster/project/rsl/$USER)/cluster/work/rsl/$USER)In exceptional cases we can approve more storage space. For this, ask your supervisor to contact
patelm@ethz.ch.
$TMPDIR)lquota in the terminal to check your used storage space for Home and Scratch directories.Project and Work directories you can run:
(head -n 5 && grep -w $USER) < /cluster/work/rsl/.rsl_user_data_usage.txt
(head -n 5 && grep -w $USER) < /cluster/project/rsl/.rsl_user_data_usage.txt
Note: This wont show the per-user quota limit which is enforced by RSL ! Refer to the table below for the quota limits.
Project and Work Directories and why is it necessary to make use of both?Basically, both Project and Work are persistent storages (meaning the data is not deleted automatically); however, the use cases are different. When you have lots of small files, for example, conda environments, you should store them in the Project directory as it has a higher capacity for # of inodes. On the other hand, when you have larger files such as model checkpoints, singularity containers and results you should store them in the Work directory as the storage capacity is higher.
$TMPDIR) ?Whenever you run a compute job, you can also ask for a certain amount of local scratch space ($TMPDIR) which allocates space on a local hard drive. The main advantage of the local scratch is, that it is located directly inside the compute nodes and not attached via the network. Thus it is highly recommended to copy over your singularity container / datasets to $TMPDIR and then use that for the trainings. Detailed workflows for the trainings are provided later in this guide.
| Storage Location | Max Inodes | Max Size per User | Purged | Recommended Use Case |
|---|---|---|---|---|
/cluster/home/$USER |
~450,000 | 45 GB | No | Code, config, small files |
/cluster/scratch/$USER |
1 M | 2.5 TB | Yes (older than 15 days) | Datasets, training data, temporary usage |
/cluster/project/rsl/$USER |
2.5 M | 75 GB | No | Conda envs, software packages |
/cluster/work/rsl/$USER |
50,000 | 200 GB | No | Large result files, model checkpoints, Singularity containers, |
$TMPDIR |
very high | Upto 800 GB | Yes (at end of job) | Training Datasets, Singularity Images |
To verify your storage setup and check quotas:
Submit as a job to test $TMPDIR:
sbatch test_storage_quotas.sh
lquota$TMPDIR for active jobs - Copy data to local scratch for faster I/O during computation