Difference between revisions of "Disk Storage"

From Research Computing Center Wiki
Jump to navigation Jump to search
Line 2: Line 2:
  
 
== Storage Overview ==
 
== Storage Overview ==
Network attached storage systems at the GACRC are tiered in three collections of systems based on speed and capacity. Our fastest network attached storage system is a Panasas ActiveStor 12 which exports 156TB of data, mounted on every node at /panfs and is divided into two categories: home and scratch.
 
  
== Home Directories ==
+
Network attached storage systems at the GACRC are tiered in three levels based on speed and capacity.  Ranked in order of decreasing speed, the file systems are "scratch", "home", and "offline" storage.  The home filesystem is the "landing zone" when users login, and the scratch filesystem is where jobs should be run.  Scratch is considered temporary and files are not to be left on it long-term.  The offline storage filesystem is where data that is currently being used should be stored when it is not being used on scratch.
  
'''Home directories are for highly static datasets. This volume can not support a high degree of variability with the features (like snapshots) it provides. Please make sure that any data stored in your home directory is limited to applications and inputs that you use frequently.'''
+
For home and scratch directories, users are assigned the following quotas (maximum space allowed):
  
All users have a default 300GB home quota (i.e., maximum limit) on their home directory; however, justifiable requests for quotas up to 2TB can be made by making a formal request through the [http://help.gacrc.uga.edu/ GACRC Website contact form]. '''Storage in the home directory to avoid archive storage fees is not a justifiable request.''' Requests for home quotas greater than 2TB must be submitted by the PI of a lab group, and approved by the GACRC advisory committee (via the IT Manager). Users may create lab directories for data that is shared by a lab group, but those directories count against the quota of the creating user. An example of this, for the “abclab” users, would be: /home/abclab/labdata. Home directories are snapshotted.
+
zcluster
 +
home= 100GB
 +
scratch= 4TB
  
=== Snapshots ===
+
sapelo
Lab home directories are snapshotted. Snapshots are like backups in that they are read-only moment-in-time captures of files and directories which can be copied from to restore files that may have been accidentally deleted or overwritten.
+
home= 100GB
 +
scratch= Currently none
 +
 
 +
The offline storage filesystem is named "project" and is configured for use by lab groups, and by default, each lab group has a 1TB quota.  Individual members of a lab group can create subdirectories under their lab's project directory.  PI's of lab groups can request additional storage on project as needed.  Please note that this storage is not meant for long-term (e.g., archive) storage of data.  That type of storage is the responsibility of the user.
 +
 
 +
 
 +
=== Storage Architecture ===
 +
 
 +
The home and scratch filesystems are mounted on the zcluster and the sapelo cluster as follows, using an example user 'jsmith' in a lab group 'abclab':
 +
 
 +
zcluster-
  
Home directories have snapshots taken once a day and maintained for 4 days, giving the user the ability to retrieve old files for up to 4 days after they have deleted them.
+
home= /home/abclab/jsmith
 +
scratch= /escratch4/jsmith/jsmith_Month_Day
  
Any directory on the /home filesystem contains a completely invisible directory named ".snapshot". This directory cannot be listed with ls or viewed by any program at all. Only the "cd" command can be used to enter this directory. Users of /home directories may retrieve files from these snapshots by using the "cd" command and copying files from the appropriate snapshot to any location they would like.
+
sapelo-
  
'''Note: ANY user, from any HOME directory can access the snapshots *from that directory* to restore files'''
+
home= /home/jsmith
 +
scratch= /lustre1/jsmith
  
For example:
+
Note that sapelo users already have a scratch directory.  Users of the zcluster need to type 'make_escratch' to create a scratch directory - the command will return the name of the directory.
  
<pre>
+
The project filesystem is not mounted on the compute nodes and cannot be accessed by running jobsIt is mounted on the zcluster login node, and on the file "copy" and "xfer" nodesThe copy and xfer nodes (discussed under [[Transferring Files | copy nodes]] are the preferred servers to use for copying and moving files between all of the filesystems, and to and from the outside world.
[cecombs@sites test]$ pwd
 
/home/rccstaff/cecombs/test
 
[cecombs@sites ~]$ cd test
 
[cecombs@sites test]$ ls
 
Main.java
 
[cecombs@sites test]$ rm -rf Main.java
 
[cecombs@sites test]$ cd .snapshot
 
[cecombs@sites .snapshot]$ ls
 
2013.04.16.00.00.01.daily 2013.04.17.00.00.01.daily 2013.04.18.00.00.01.daily
 
[cecombs@sites .snapshot]$ cd 2013.04.18.00.00.01.daily/
 
[cecombs@sites 2013.04.18.00.00.01.daily]$ ls
 
Main.java
 
[cecombs@sites 2013.04.18.00.00.01.daily]$ cp Main.java /home/rccstaff/cecombs/test
 
[cecombs@sites 2013.04.18.00.00.01.daily]$ cd /home/rccstaff/cecombs/test
 
[cecombs@sites test]$ ls
 
Main.java
 
</pre>
 
  
== Scratch ==
+
The project filesystem has a consistent mount point of:
  
Scratch Directories are for dynamic data. Scratch space is specifically designed to handle datasets that grow and shrink on-demand. The GACRC does not and can not snapshot scratch directories because the amount of data which changes periodically is too great and snapshots would only serve to slow the file systems down.
+
/project/abclab
  
=== eScratch ===
 
eScratch directories are for ephemeral datasets; most commonly, the output of large calculations that need to be stored in a temporary place for a short period of time. Any user can make escratch directories for their work. Ephemeral scratch directories on GACRC clusters reside on a Panasas ActiveStor 12 storage cluster.
 
  
==== Making an eScratch Directory ====
+
=== Auto Mounting Filesystems ===
Researchers who need to use scratch space can type '''(''on the login node zcluster.rcc ONLY'')'''
 
  
<pre class="gcommand">
+
Some filesystems are "auto mounted" when they are first accessed on a server.  For the xfer nodes, this includes Sapelo home directories and the project filesystems.  For the zcluster copy nodes, this includes the project filesystems.  Sapelo interactive ("qlogin") nodes will mount a user's home directory when the qlogin happens.
make_escratch
 
</pre>
 
  
a sub-directory will be created, and the user will be told the path to the sub-directory, e.g. /panfs/pstor.storage/escratch1/jsmith_Oct_22. The life span of the directory will be one week longer than the longest duration queue, which is currently 30 days (i.e., life span = 37 days). '''At that time, the directory and its contents will be systematically deleted. ''' Users can create one escratch directory per day if needed. The total space a user can use on escratch (all escratch directories '''combined''') is 4TB. The escratch directories are not backed up.
 
  
If a user needs to retain their self-created escratch directory for more than the 37 days which they were allocated, they may contact the GACRC staff for an extension through the [http://help.gacrc.uga.edu/ support form]. We will grant almost all requests, but escratch directories do use precious HPC storage space and we must ask that a new request be submitted for every 37 day period that they are needed to ensure that the space is freed as soon as possible.
+
=== Snapshots ===
  
=== lscratch ===
+
Home directories are snapshotted. Snapshots are like backups in that they are read-only moment-in-time captures of files and directories which can be used to restore files that may have been accidentally deleted or overwritten.
  
lscratch stands for local scratch and is available on every node in the zcluster.
+
Home directories on sapelo have snapshots taken once a day and maintained for 4 days, giving the user the ability to retrieve old files for up to 4 days after they have deleted them.  On the zcluster, some home directories have snapshots taken once a day, and some have snapshots taken once every 2 days; these are maintained for 4 days.
  
==== lscratch information ====
+
Contact the GACRC staff if you need to recover data from a snapshot.
All /lscratch filesystems on every node have these properties:
 
* lscratch is by far the fastest possible filesystem at the GACRC, however the lscratch directory is only available to the node that a job gets scheduled to.
 
* lscratch filesystem resides on the local hard drive of the node.
 
* Represents the remainder of unused disk after the OS is installed.
 
* Multiple different sizes for /lscratch; nodes have different sized disks.
 
* Not accessible from other nodes
 
* Every user has a directory on every node, /lscratch/<username>
 
  
==== lscratch Guidelines ====
 
This is a list of guidelines for /lscratch usage:
 
* Do not count on any lscratch sizes above 10G unless you know the size of the local hard drive and target that node specifically (e.g.: qsub -l h=compute-15-36)
 
* You will be responsible for migrating your data from the node after your job finishes. The job itself can transfer the data.
 
* Make sure that your output goes to: /lscratch/<username> (e.g: /lscratch/cecombs)
 
  
== Quotas ==
 
  
To see how much space you are consuming on the home and scratch file systems, please use the command
+
=== Current Storage Systems ===
  
<pre class="gcommand">
+
(1) Panasas ActiveStor 12 storage cluster with 133TB usable capacity, running PanFS parallel file system.  Currently supporting the home filesystem on the zcluster
quota_rep
 
</pre>
 
  
== Overflow/Archival Storage ==
+
(1) Seagate (Xyratex) Lustre appliance with 240TB usable capacity.  Currently supporting the scratch filesystem on sapelo
  
Some labs also have a subscription archival storage space, which is mounted on the zcluster login node and on the [[Transferring Files | copy nodes]] as /oflow (note that /oflow is not mounted on the compute nodes). The archival storage system is for long-term storage of large, static datasets.
+
(3) Penguin IceBreakers storage chains running ZFS mounted through NFS for a total of 84TB usable capacity. Currently supporting home directories on sapelo
  
This filesystem is snapshotted. The snapshots are available only from the mount point under the hidden ".zfs" directory (e.g.: /oflow/jlmlab/.zfs). Overflow devices are snapshotted once every hour, day, week and month. 24 hourlies, 7 dailies, 4 weeklies and 4 monthlies are kept.
+
(2) Penguin IceBreakers storage chains running ZFS mounted through NFS for a total of 374TB usable capacity. This storage is used as an active project repository
  
Please contact the GACRC staff to request Overflow storage.
+
(1) Penguin IceBreaker storage chains running ZFS mounted through NFS for a total of 142TB usable capacity. This storage is used as a backup resource for the home and project filesystems

Revision as of 09:18, 27 April 2016


Storage Overview

Network attached storage systems at the GACRC are tiered in three levels based on speed and capacity. Ranked in order of decreasing speed, the file systems are "scratch", "home", and "offline" storage. The home filesystem is the "landing zone" when users login, and the scratch filesystem is where jobs should be run. Scratch is considered temporary and files are not to be left on it long-term. The offline storage filesystem is where data that is currently being used should be stored when it is not being used on scratch.

For home and scratch directories, users are assigned the following quotas (maximum space allowed):

zcluster home= 100GB scratch= 4TB

sapelo home= 100GB scratch= Currently none

The offline storage filesystem is named "project" and is configured for use by lab groups, and by default, each lab group has a 1TB quota. Individual members of a lab group can create subdirectories under their lab's project directory. PI's of lab groups can request additional storage on project as needed. Please note that this storage is not meant for long-term (e.g., archive) storage of data. That type of storage is the responsibility of the user.


Storage Architecture

The home and scratch filesystems are mounted on the zcluster and the sapelo cluster as follows, using an example user 'jsmith' in a lab group 'abclab':

zcluster-

home= /home/abclab/jsmith scratch= /escratch4/jsmith/jsmith_Month_Day

sapelo-

home= /home/jsmith scratch= /lustre1/jsmith

Note that sapelo users already have a scratch directory. Users of the zcluster need to type 'make_escratch' to create a scratch directory - the command will return the name of the directory.

The project filesystem is not mounted on the compute nodes and cannot be accessed by running jobs. It is mounted on the zcluster login node, and on the file "copy" and "xfer" nodes. The copy and xfer nodes (discussed under copy nodes are the preferred servers to use for copying and moving files between all of the filesystems, and to and from the outside world.

The project filesystem has a consistent mount point of:

/project/abclab


Auto Mounting Filesystems

Some filesystems are "auto mounted" when they are first accessed on a server. For the xfer nodes, this includes Sapelo home directories and the project filesystems. For the zcluster copy nodes, this includes the project filesystems. Sapelo interactive ("qlogin") nodes will mount a user's home directory when the qlogin happens.


Snapshots

Home directories are snapshotted. Snapshots are like backups in that they are read-only moment-in-time captures of files and directories which can be used to restore files that may have been accidentally deleted or overwritten.

Home directories on sapelo have snapshots taken once a day and maintained for 4 days, giving the user the ability to retrieve old files for up to 4 days after they have deleted them. On the zcluster, some home directories have snapshots taken once a day, and some have snapshots taken once every 2 days; these are maintained for 4 days.

Contact the GACRC staff if you need to recover data from a snapshot.


Current Storage Systems

(1) Panasas ActiveStor 12 storage cluster with 133TB usable capacity, running PanFS parallel file system. Currently supporting the home filesystem on the zcluster

(1) Seagate (Xyratex) Lustre appliance with 240TB usable capacity. Currently supporting the scratch filesystem on sapelo

(3) Penguin IceBreakers storage chains running ZFS mounted through NFS for a total of 84TB usable capacity. Currently supporting home directories on sapelo

(2) Penguin IceBreakers storage chains running ZFS mounted through NFS for a total of 374TB usable capacity. This storage is used as an active project repository

(1) Penguin IceBreaker storage chains running ZFS mounted through NFS for a total of 142TB usable capacity. This storage is used as a backup resource for the home and project filesystems