Disk Storage: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
Line 35: Line 35:
The scratch file system is mounted on the login nodes, xfer nodes, and compute nodes.
The scratch file system is mounted on the login nodes, xfer nodes, and compute nodes.


The recommended data workflow will have jobs write output files, include intermediate data, such as checkpoint files, and file results into the scratch file system. Files results should then be transferred out of the scratch file system, if these are not needed for other jobs that are being submitted soon.  
The recommended data workflow will have jobs write output files, include intermediate data, such as checkpoint files, and final results into the scratch file system. Final results, intermediate files, and other data should then be transferred out of and immediately deleted from the scratch file system, if these are not needed for other jobs that are being submitted soon.  


Because the scratch file system stores large amounts of data that change a lot, it is does not have snapshots turned on and it is not backed up in anyway. Files deleted from a scratch directory cannot be recovered.  
Because the scratch file system stores large amounts of data that change a lot, it is does not have snapshots turned on and it is not backed up in anyway. Files deleted from a scratch directory cannot be recovered.  

Revision as of 22:19, 20 November 2018


Storage Overview

Network attached storage systems at the GACRC are tiered in three levels based on speed and capacity. Ranked in order of decreasing speed, the file systems are "scratch" and "work", "home", and "offline" storage.

The home filesystem is the "landing zone" when users login, and the scratch filesystem is where jobs should be run. Scratch is considered temporary and files are not to be left on it long-term. The work file system is a group-shared space that can be used to store common files needed by jobs. The offline storage filesystem is where data that is currently being used should be stored when it is not being used on scratch.

Each compute node has local physical hard drives that the user can utilize as temporary storage, aka lscratch. The lscratch device is a very fast storage device compared to the network attached storage systems. The drawback is that the capacity is low and it cannot be accessed from outside the compute node. The data in lscratch is not backed up and it can be deleted anytime after the job on the compute node is finished.


Home file system

When you login into a system (e.g. sapelo2 or xfer nodes), you will land on your home directory. Home directories are "auto mounted" on the login nodes and xfer nodes when you login. Your home directory on the xfer nodes is the same as your home directory on sapelo2. Sapelo2 interactive ("qlogin") nodes will mount a user's home directory when the qlogin happens and compute nodes will mount a user's home directory when a job submitted by this user is dispatched to those compute nodes. Users of the teaching cluster have a separate home directory, which is not the same as on Sapelo2.

Home directories have a per user quota and have snapshots. Snapshots are like backups in that they are read-only moment-in-time captures of files and directories which can be used to restore files that may have been accidentally deleted or overwritten. A user's snapshot is stored within his/her home file system, thus snapshots consume a user's home directory quota. If files are created and deleted with frequency, the snapshots will grow and might end up using a large fraction (or all) the space available within a user's home file system.

The recommended data workflow is to have files in the home directory *change* as little as possible. These should be databases, applications that you use frequently but do not need to modify that often and other things that you, primarily, *read from*. Think of snapshots as the memory of the files that were stored there - no matter if you add, change or delete the files, the total sum of that activity will build up over time and may exceed your quota.

Summary of the home directory characteristics for a sample user 'jsmith' in 'abclab':

sapelo2
home dir quota = 100GB
home dir path = /home/jsmith
snapshots = yes
subject to 30-day purge = no


Scratch file system

The scratch file system resides on a high-speed storage device and it should be used to store temporary files needed for current jobs. Files that are not needed for current jobs should not be left on the scratch file system. This file system is mounted on the login nodes, xfer nodes, and compute nodes.

The scratch file system is mounted on the login nodes, xfer nodes, and compute nodes.

The recommended data workflow will have jobs write output files, include intermediate data, such as checkpoint files, and final results into the scratch file system. Final results, intermediate files, and other data should then be transferred out of and immediately deleted from the scratch file system, if these are not needed for other jobs that are being submitted soon.

Because the scratch file system stores large amounts of data that change a lot, it is does not have snapshots turned on and it is not backed up in anyway. Files deleted from a scratch directory cannot be recovered.

There is no per user quota in the scratch file system, but a file retention policy is implemented to help prevent this file system from filling up.


Scratch file system retention policy

Any file that is not accessed or modified by a compute job in a time period no longer than 30 days will be automatically deleted off the /scratch file system. Measures circumventing this policy will be monitored and actively discouraged.

There is no storage size quota for /scratch usage. Space is only limited by the physical size of the scratch space being used. If usage across the entire file system is more than 80% of total capacity, the GACRC will take additional measures to reduce usage to a more suitable level. Amongst possible actions, request/force users to clean up their /scratch directories or reduce temporarily the 30 day limit to a lower limit.


Summary of the scratch directory characteristics for a sample user 'jsmith' in 'abclab':

sapelo2
scratch dir quota = Currently no per user quota
scratch dir path = /scratch/jsmith
snapshots = no
subject to 30-day purge = yes

Work file system

The work file system resides on a high-speed storage device and it should be used to store files needed for jobs. Each group has a directory in the work file system and this space can be used to store files needed by multiple users within a group. The work file system has a per group quota and files stored there are not subject to the auto-purge policy that is applied to the scratch file system.

The work file system is mounted on the login nodes, xfer nodes, and compute nodes.

The recommended data workflow is to have files needed for jobs, possibly by multiple users within a group, such as reference data and model data, be stored in the group work directory.

The work file system does not have snapshots turned on and it is not backed up in anyway. Files deleted from a work directory cannot be recovered.

Summary of the work directory characteristics for a sample user 'jsmith' in 'abclab':

sapelo2
work dir group quota = (to be added)
work dir path = /work/abclab
snapshots = no
subject to 30-day purge = no


lscratch file system

Each compute node has local physical hard drives that the user can utilize as temporary storage. The file system defined on the hard drives is called /lscratch. The lscratch device is a very fast storage device compared to the network attached storage systems. The drawback is that the capacity is low and it cannot be accessed from outside the compute node. This file system can be used for single-core jobs and for multi-thread jobs that run within a single node. In general, parallel jobs that use more than one node (e.g. MPI jobs) cannot use the /lscratch file system.

The data in lscratch is not backed up and it needs to be deleted when job on the compute node is finished.

Jobs that do not need to write large output files, but that need to access the files often (for example, to write small amounts of data into disk), can benefit from using /lscratch. Jobs that use /lscratch should request the amount of space in /lscratch. For information on how to request lscratch space for jobs, please refer to How to run a job from lscratch

Summary of the lscratch directory characteristics for a sample user 'jsmith' in 'abclab':

sapelo2
quota = Limited by device size (Approx. 210GB on the AMD nodes and 800GB on the Intel nodes)
path = /lscratch
snapshots = no
subject to purge = yes (files to be deleted when job exits the node) 


Project file system

The offline storage filesystem is named "project" and is configured for use by lab groups. By default, each lab group has a 1TB quota. Individual members of a lab group can create subdirectories under their lab's project directory. PI's of lab groups can request additional storage on project as needed. Please note that this storage is not meant for long-term (e.g., archive) storage of data. That type of storage is the responsibility of the user.

The project filesystem is not mounted on the compute nodes and cannot be accessed by running jobs. It is mounted on the "xfer" nodes when it is first accessed using its full path.

The project filesystem has snapshots turned on.

The recommended data workflow is to have data not needed for current jobs, but that are still needed for future jobs on the cluster, be transferred into the project file system and deleted from the scratch area.

Summary of the project directory characteristics for a sample group 'abclab':

sapelo2
quota = default of 1TB per group
path = /project/abclab
snapshots = yes
subject to 30-day purge = no


Back to Top


Storage Architecture Summary

Mount path for home, scratch, work, and lscratch filesystems using an example user 'jsmith' in a lab group 'abclab':

sapelo2

home= /home/jsmith
scratch= /scratch/jsmith
work= /work/abclab 
lscratch= /lscratch


Quota for home, scratch, work, and lscratch filesystems:

sapelo2

home= 100GB
scratch= Currently no quota
work= (to be added)
lscratch= Limited by device size (Approx. 210GB on the AMD nodes and 800GB on the Intel nodes)


Auto Mounting Filesystems

Some filesystems are "auto mounted" when they are first accessed on a server. For the xfer nodes, this includes Sapelo2 home directories and the project filesystems. Sapelo2 interactive ("qlogin") nodes will mount a user's home directory when the qlogin happens.


Snapshots

Home directories are snapshotted. Snapshots are like backups in that they are read-only moment-in-time captures of files and directories which can be used to restore files that may have been accidentally deleted or overwritten.

Home directories on sapelo2 have snapshots taken once a day and maintained for 4 days, giving the user the ability to retrieve old files for up to 4 days after they have deleted them. Weekly and monthly snapshots are also made and a few recent snapshots are stored.

Each /home filesystem contains a completely invisible directory named ".zfs". This directory cannot be listed with ls or viewed by any program at all. Only the "cd" command can be used to enter this directory. Users of /home directories may retrieve files from these snapshots by using the "cd" command to navigate from the top level of their home dir into an appropriate snapshot and copying files from the that snapshot to any location they would like.

Note: ANY user, from the top level of his/her HOME directory can access the snapshots of his/her home directory to restore files

Here is the example for sapelo2:

[jsmith@sapelo2-sub1 ]$ pwd
/home/jsmith

[jsmith@sapelo2-sub1 ]$ cd .zfs

[jsmith@sapelo2-sub1 .zfs]$ ls
shares  snapshot

[jsmith@sapelo2-sub1 .zfs]$ cd snapshot

[jsmith@sapelo2-sub1 snapshot]$ ls    
zfs-auto-snap_daily-2018-11-09-0812      zfs-auto-snap_monthly-2018-10-26-0800      
zfs-auto-snap_daily-2018-11-10-0808      zfs-auto-snap_weekly-2018-10-17-0742    
zfs-auto-snap_daily-2018-11-11-0838      zfs-auto-snap_weekly-2018-10-31-0738
zfs-auto-snap_daily-2018-11-12-0806      zfs-auto-snap_weekly-2018-11-07-0805

[jsmith@sapelo2-sub1 snapshot]$ cd zfs-auto-snap_daily-2018-11-11-0838

[jsmith@sapelo2-sub1  zfs-auto-snap_daily-2018-11-11-0838]$ cp my-to-restore-file /home/jsmith/test

Back to Top


Current Storage Systems

(1) Seagate (Xyratex) ClusterStor1500 Lustre appliance (480TB) - $SCRATCH on Sapelo2

(2) DDN SFA14KX Lustre appliance (1.26PB) - $SCRATCH & $WORK on Sapelo2

(3) Penguin IceBreakers ZFS storage chains (84TB usable capacity) - $HOME on Sapelo2

(4) Penguin IceBreakers ZFS storage chains (374TB usable capacity) - $PROJECT research groups' long-term space - only for active projects requiring Sapelo2 access

(5) Panasas ActiveStor 100H (1PB) - $PROJECT research groups' long-term space - only for active projects requiring Sapelo2 access

(6) ZFS storage chains (720TB) - backup environment for $HOME and $PROJECT.