Skip to content

Filesystems overview#

The Panthera provides access to three different file systems, each with distinct storage characteristics. Those file systems are shared with other users, and are subject to quota limits and for some of them, purge policies (time-residency limits). The main filesystem on the center of Panthera is BEEGFS which is made up of two pools: the fast pool for the jobs are high I/O bounded and the bulk pool which is suitable for slower jobs. Our storage is currently sized at around 30 Terabytes. This houses both user workdir directories and group research directories. It is designed to be scalable as the need arises.

Info

Historically, there has been a long story about the NHPCC filesystems we have used before. At the very early begining of NHPCC (in 2009), there was Lustre file system as well as GFS for central storage and as time went by, we started to add a newer file system named Gluster onward until 2018. From now and then we decided to reconstruct our storage infrastructure by using BEEGFS, thanks to it's cool features to get the most out of it.

Please consider

NHPCC's storage resources are limited and are shared among many users. They are meant to store data and code associated with projects for which you are using Panthera's computational resources. This space is for work actively being computed on with Panthera, and should not be used as a target for backups from other systems.

Home directories#

All users have a home directory mounted under /home/<your-username> (e.g. /home/u111111). This is networked storage, accessible from all nodes in the cluster and is backed up nightly. This space is deliberately small and limited to 10GB per user. Avoid using this space as a destination for data generated by cluster jobs, as home directories fill up quickly and exceeding your quota will cause your jobs to fail.

Working directory space - $WRKDIR#

Working directory space is used for temporary storage of files. We recommend you use this for data generated by cluster jobs. The jobs will likely run faster due to the higher performance of this system. The wrkdir folder isn’t backed up by the HPC team and hence it’s advisable to place only the temporary files in it. All users have a 200GB quota in scratch by default, however files are automatically deleted if you meet one of these conditions:

File deletion rules

  • If your credit has not expired and 10 days have passed since your last login and you do not have any jobs in the cluster.
  • If your credit has expired and 5 days have passed since your last login.

Research Group storage space - $GROUP_HOME#

Research Groups, Projects or Labs can have storage allocated to them. That is, we have defined a namespace for users planning to work on a shared project as groups. Like home, this is a networked storage mounted under /group_home and it is only accessible by the members of that group.

Each Research Group can have 10 GB of free shared space for this. Research Groups that have multiple projects, or groups can have multiple shares, however the free 10 GB is only allocated once per Research Group.

Local scratch on nodes - $TMP#

There is temporary space available on the nodes that can be used when you submit a job to the cluster. The size of this type of storage per node is almost 600GB.

As this storage is physically located on the nodes, it cannot be shared between them, but it might provide better performance for I/O intensive tasks than networked storage.

Tip

We would typically recommend using the workdir system where possible, however there are sometimes edge-cases that perform badly on anything except local storage.

Backups and data retention#

HPC team make nightly backups of /home and the backups are kept for 3 days. Files in /home and /project will be kept as long as your account is active. If your account remains inactive for 1 year, the account will get locked and after 60 days of grace period, your files in /home and /project will be deleted.

HPC team don’t make backups of /scratch. Files in /scratch which aren’t accessed for 120 days will be automatically deleted. Users will get notifications when their files weren’t accessed for 92 days, 106 days, and 120 days(purged).

Quotas and limits#

Quotas are applied on both volume (the amount of data stored in bytes) and inodes: an inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. In practice, each filesystem entry (file, directory, link) counts as an inode.

Entry limits#

Name Quota type Volume quota inode quota Data retention
$HOME directory 10 GB time limited
$WRKDIR directory 200 GB time limited
$TMP directory 600 GB job lifetime
$BULK directory 200 TB time limited

Retention types#

  • time limited: files are kept for a fixed length of time after they've been last modified. Once the limit is reached, files expire and are automatically deleted.
  • job lifetime: files are only kept for the duration of the job and are automatically purged when the job ends.

Checking quotas#

Where should I store my files?#

Tip

Choosing the appropriate storage location for your files is an essential step towards making your utilization of the cluster the most efficient possible. It will make your own experience much smoother, yield better performance for your jobs and simulations, and contribute to make Panthera a useful and well-functioning resource for everyone.

Here is where we recommend storing different types of files and data on Panthera:

  • personal scripts, configuration files and software installations → $HOME
  • group-shared scripts, software installations and medium-sized datasets → $GROUP_HOME
  • temporary output of jobs, large checkpoint files → $WRKDIR

Accessing filesystems#

We strongly recommend using those variables in your scripts rather than explicit paths, to facilitate transition to new systems for instance. By using those environment variables, you'll be sure that your scripts will continue to work even if the underlying filesystem paths change.

To see the contents of these variables, you can use the echo command. For instance, to see the absolute path of your $WRKDIR directory:

$ echo $WRKDIR
/home/u111111/wrkdir

Or for instance, to move to your group-shared home directory:

$ cd $GROUP_HOME

Using $TMP#

There is temporary space available on the nodes that can be used when you submit a job to the cluster.

As this storage is physically located on the nodes, it is not shared between nodes, but it will provide better performance for read/write (I/O) intensive tasks on a single node than networked storage. However, to use the temporary scratch space, you will need to copy files from networked storage to the temporary scratch space. In addition, if a job fails then any intermediate files created may be lost.

If your job does a lot of I/O operations to large files, it may therefore improve performance to:

  • copy files from your home directory into the temporary folder
  • run your job in the temporary folder
  • copy files back from the temporary folder to your home directory if needed
  • delete them from the temporary folder as soon as they're no longer needed

Basic example#

The following job runs a shell-script ./runcode.sh in a data folder beneath a user's home directory. The data is held on networked storage at this point.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

cd $HOME/project
./runcode.sh
On any node the temporary scratch directory is accessed using the variable $TMP. If specific, known files are needed in your processing, you can copy your data to that space before working on it.

The following job:

  • copies data.file from the project directory to the temporary area
  • sets the current working directory to the temporary area
  • runs the appropriate code
  • copies the output file results.data back to the project directory

This is the equivalent of the previous example, but using the temporary storage

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

# Copy data.file from the project directory to the temporary scratch space
cp $HOME/project/data.file $TMP

# Move into the temporary scratch space where your data now is
cd $TMP

# Do processing - as this is a small shell script, it is run from the network storage
$HOME/project/runcode.sh

# Copy results.data back to the project directory from the temporary scratch space
cp $TMP/results.data $HOME/project/

If you do not know, or cannot list all the possible output files that you would like to move back to you home directory you can use rsync to only copy changed and new files back at the end of the job. This will save time and avoid unnecessary copying.

The following job:

  • copies files to the temporary scratch area
  • runs the shell-script ./runcode.sh on the local copy
  • copies the results back to networked storage
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

# Source folder for data
DATADIR=$HOME/project

# Copy data (inc. subfolders) to temporary storage
rsync -rltv $DATADIR/ $TMP/

# Run job from temporary folder
cd $TMP
./runcode.sh

# Copy changed files back
rsync -rltv $TMP/ $DATADIR/

Viewing temporary files#

To view temporary files while the job is running (to ensure the job is correct) you can ssh to the node.

SSH Connections

As per the Usage Policy SSH sessions on nodes should be limited to monitoring jobs.

Advanced example#

This advanced example uses rsync for speed and will ensure cleanup happens at the end of a job or when the job hits the soft limit.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_vmem=2G
#$ -l h_rt=1:0:0   # Request 1 hour runtime
#$ -l s_rt=0:55:0  # Clean up after 55 minutes

function Cleanup ()
{
    trap "" SIGUSR1 EXIT # Disable trap now we're in it
    # Clean up task
    rsync -rltv $TMP/ $DATADIR/
    exit 0
}

DATADIR=$(pwd)
trap Cleanup SIGUSR1 EXIT # Enable trap

cd $TMP
rsync -rltv $DATADIR/ $TMP/

# Job
./runcode.sh

From other systems#

External filesystems cannot be mounted on Panthera

For a variety of security, manageability and technical considerations, we can't mount external filesystems nor data storage systems on Panthera. The recommended approach is to make Panthera's data available on external systems.