Disk Storage: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
[[Category:Zcluster]][[Category:Storage]]
[[Category:Zcluster]][[Category:Storage]]


The home and ephemeral scratch directories on GACRC cluster reside on a Panasas ActiveStor 12 storage cluster and these filesystems are mounted on the different clusters.
== Storage Overview ==
Network attached storage systems at the GACRC are tiered in three collections of systems based on speed and capacity. Our fastest network attached storage system is a Panasas ActiveStor 12 which exports 156TB of data mounted on every node at /panfs and divided into three categories: /home, /scratch and /escratch.


== Home Directories ==
All users have a default 100GB home quota (i.e., maximum limit) on their home directory; however, justifiable requests for quotas up to 2TB can be made by contacting the GACRC IT Manager (currently Greg Derda: derda@uga.edu). Storage in the home directory to avoid archive storage fees is not a justifiable request. Requests for home quotas greater than 2TB must be submitted by the PI of a lab group, and approved by the GACRC advisory committee (via the IT Manager). Users may create lab directories for data that is shared by a lab group, but those directories count against the quota of the creating user. An example of this, for the “abclab” users, would be: /home/abclab/labdata. Home directories are backed up.
All users have a default 100GB home quota (i.e., maximum limit) on their home directory; however, justifiable requests for quotas up to 2TB can be made by contacting the GACRC IT Manager (currently Greg Derda: derda@uga.edu). Storage in the home directory to avoid archive storage fees is not a justifiable request. Requests for home quotas greater than 2TB must be submitted by the PI of a lab group, and approved by the GACRC advisory committee (via the IT Manager). Users may create lab directories for data that is shared by a lab group, but those directories count against the quota of the creating user. An example of this, for the “abclab” users, would be: /home/abclab/labdata. Home directories are backed up.


The current scratch file system is mounted on the compute clusters as escratch. Researchers who need to use scratch space can type  
=== Snapshots ===
Lab home directories are snapshotted. Snapshots are like backups in that they are read-only moment-in-time captures of files and directories which can be copied from to restore files that may have been accidentally deleted or overwritten.
 
Home directories have snapshots taken once a day and maintained for 4 days, giving the user the ability to retrieve old files for up to 4 days after they have deleted them.
 
Any directory on the /home filesystem contains a completely invisible directory named ".snapshot". This directory cannot be listed with ls or viewed by any program at all. Only the "cd" command can be used to enter this directory. Users of /home directories may retrieve files from these snapshots by using the "cd" command and copying files from the appropriate snapshot to any location they would like.
 
'''Note: ANY user, from any HOME directory can access the snapshots *from that directory* to restore files'''
 
For example:
 
<pre>
[cecombs@sites test]$ pwd
/home/rccstaff/cecombs/test
[cecombs@sites ~]$ cd test
[cecombs@sites test]$ ls
Main.java
[cecombs@sites test]$ rm -rf Main.java
[cecombs@sites test]$ cd .snapshot
[cecombs@sites .snapshot]$ ls
2013.04.16.00.00.01.daily  2013.04.17.00.00.01.daily  2013.04.18.00.00.01.daily
[cecombs@sites .snapshot]$ cd 2013.04.18.00.00.01.daily/
[cecombs@sites 2013.04.18.00.00.01.daily]$ ls
Main.java
[cecombs@sites 2013.04.18.00.00.01.daily]$ cp Main.java /home/rccstaff/cecombs/test
[cecombs@sites 2013.04.18.00.00.01.daily]$ cd /home/rccstaff/cecombs/test
[cecombs@sites test]$ ls
Main.java
</pre>
 
== Scratch ==
 
Scratch Directories are for dynamic data. Scratch space is specifically designed to handle datasets that grow and shrink on-demand. The GACRC does not and can not snapshot scratch directories because the amount of data which changes periodically is too great and snapshots would only serve to slow the file systems down.
 
=== eScratch ===
 
eScratch directories are for Ephemeral datasets. Commonly the output of large calculations that need to be stored in a temporary place for a short period of time. Any user can make escratch directories for their work. Ephemeral scratch directories on GACRC cluster reside on a Panasas ActiveStor 12 storage cluster.
 
==== Making an eScratch Directory ====
 
Researchers who need to use scratch space can type  
 
<pre class="gcommand">
<pre class="gcommand">
make_escratch
make_escratch
</pre>
</pre>
and a sub-directory will be created, and the user will be told the path to the sub-directory e.g., /escratch/jsmith_Oct_22. The life span of the directory will be one week longer than the longest duration queue, which is currently 30 days (i.e., life span = 37 days). At that time, the directory and its contents will be deleted. Users can create one escratch directory per day if needed. The total space a user can use on scratch (all scratch directories combined) is 4TB. The scratch directories are not backed up.
 
and a sub-directory will be created, and the user will be told the path to the sub-directory e.g., /panfs/pstor.storage/escratch1/jsmith_Oct_22. The life span of the directory will be one week longer than the longest duration queue, which is currently 30 days (i.e., life span = 37 days). At that time, the directory and its contents will be deleted. Users can create one escratch directory per day if needed. The total space a user can use on scratch (all scratch directories combined) is 4TB. The scratch directories are not backed up.
 
=== lscratch ===
 
lscratch stands for local scratch and is available on every node in the zcluster.
 
==== lscratch information ====
All /lscratch filesystems on every node have these properties:
* lscratch is by far the fastest possible filesystem at the GACRC, however the lscratch directory is only available to the node that a job get scheduled to.
* lscratch filesystem resides on the local hard drive of the node.
* Represents the remainder of unused disk after the OS is installed.
* Multiple different sizes for /lscratch; nodes have different sized disks.
* Not accessible from other nodes
* Every user has a directory on every node, /lscratch/<username>
 
==== lscratch Guidelines ====
This is a list of guidelines for /lscratch usage:
* Do not count on any lscratch sizes above 10G unless you know the size of the local hard drive and target that node specifically (e.g.: qsub -l h=compute-15-36)
* You will be responsible for migrating your data from the node after your job finishes. The job itself can transfer the data.
* Make sure that your output goes to: /lscratch/<username> (e.g: /lscratch/cecombs)
 
== Quotas ==


To see how much space you are consuming on the home and scratch file systems, please use the command
To see how much space you are consuming on the home and scratch file systems, please use the command
Line 17: Line 82:
</pre>
</pre>


Some labs also have a subscription archival storage space, which is mounted on the zcluster login node and on the [[Transferring Files | copy nodes]] as /oflow (note that /oflow is not mounted on the compute nodes).
== Overflow/Archival Storage ==
 
Some labs also have a subscription archival storage space, which is mounted on the zcluster login node and on the [[Transferring Files | copy nodes]] as /oflow (note that /oflow is not mounted on the compute nodes). The archival storage system is for long-term storage of large, static datasets.
 
This filesystem is snapshotted. The snapshots are available only from the mount point under the hidden ".zfs" directory (e.g.: /oflow/jlmlab/.zfs).
 
Please contact the GACRC staff to request Overflow storage.

Revision as of 14:52, 18 April 2013


Storage Overview

Network attached storage systems at the GACRC are tiered in three collections of systems based on speed and capacity. Our fastest network attached storage system is a Panasas ActiveStor 12 which exports 156TB of data mounted on every node at /panfs and divided into three categories: /home, /scratch and /escratch.

Home Directories

All users have a default 100GB home quota (i.e., maximum limit) on their home directory; however, justifiable requests for quotas up to 2TB can be made by contacting the GACRC IT Manager (currently Greg Derda: derda@uga.edu). Storage in the home directory to avoid archive storage fees is not a justifiable request. Requests for home quotas greater than 2TB must be submitted by the PI of a lab group, and approved by the GACRC advisory committee (via the IT Manager). Users may create lab directories for data that is shared by a lab group, but those directories count against the quota of the creating user. An example of this, for the “abclab” users, would be: /home/abclab/labdata. Home directories are backed up.

Snapshots

Lab home directories are snapshotted. Snapshots are like backups in that they are read-only moment-in-time captures of files and directories which can be copied from to restore files that may have been accidentally deleted or overwritten.

Home directories have snapshots taken once a day and maintained for 4 days, giving the user the ability to retrieve old files for up to 4 days after they have deleted them.

Any directory on the /home filesystem contains a completely invisible directory named ".snapshot". This directory cannot be listed with ls or viewed by any program at all. Only the "cd" command can be used to enter this directory. Users of /home directories may retrieve files from these snapshots by using the "cd" command and copying files from the appropriate snapshot to any location they would like.

Note: ANY user, from any HOME directory can access the snapshots *from that directory* to restore files

For example:

[cecombs@sites test]$ pwd
/home/rccstaff/cecombs/test
[cecombs@sites ~]$ cd test
[cecombs@sites test]$ ls
Main.java
[cecombs@sites test]$ rm -rf Main.java
[cecombs@sites test]$ cd .snapshot
[cecombs@sites .snapshot]$ ls
2013.04.16.00.00.01.daily  2013.04.17.00.00.01.daily  2013.04.18.00.00.01.daily
[cecombs@sites .snapshot]$ cd 2013.04.18.00.00.01.daily/
[cecombs@sites 2013.04.18.00.00.01.daily]$ ls
Main.java
[cecombs@sites 2013.04.18.00.00.01.daily]$ cp Main.java /home/rccstaff/cecombs/test
[cecombs@sites 2013.04.18.00.00.01.daily]$ cd /home/rccstaff/cecombs/test
[cecombs@sites test]$ ls
Main.java

Scratch

Scratch Directories are for dynamic data. Scratch space is specifically designed to handle datasets that grow and shrink on-demand. The GACRC does not and can not snapshot scratch directories because the amount of data which changes periodically is too great and snapshots would only serve to slow the file systems down.

eScratch

eScratch directories are for Ephemeral datasets. Commonly the output of large calculations that need to be stored in a temporary place for a short period of time. Any user can make escratch directories for their work. Ephemeral scratch directories on GACRC cluster reside on a Panasas ActiveStor 12 storage cluster.

Making an eScratch Directory

Researchers who need to use scratch space can type

make_escratch

and a sub-directory will be created, and the user will be told the path to the sub-directory e.g., /panfs/pstor.storage/escratch1/jsmith_Oct_22. The life span of the directory will be one week longer than the longest duration queue, which is currently 30 days (i.e., life span = 37 days). At that time, the directory and its contents will be deleted. Users can create one escratch directory per day if needed. The total space a user can use on scratch (all scratch directories combined) is 4TB. The scratch directories are not backed up.

lscratch

lscratch stands for local scratch and is available on every node in the zcluster.

lscratch information

All /lscratch filesystems on every node have these properties:

  • lscratch is by far the fastest possible filesystem at the GACRC, however the lscratch directory is only available to the node that a job get scheduled to.
  • lscratch filesystem resides on the local hard drive of the node.
  • Represents the remainder of unused disk after the OS is installed.
  • Multiple different sizes for /lscratch; nodes have different sized disks.
  • Not accessible from other nodes
  • Every user has a directory on every node, /lscratch/<username>

lscratch Guidelines

This is a list of guidelines for /lscratch usage:

  • Do not count on any lscratch sizes above 10G unless you know the size of the local hard drive and target that node specifically (e.g.: qsub -l h=compute-15-36)
  • You will be responsible for migrating your data from the node after your job finishes. The job itself can transfer the data.
  • Make sure that your output goes to: /lscratch/<username> (e.g: /lscratch/cecombs)

Quotas

To see how much space you are consuming on the home and scratch file systems, please use the command

quota_rep

Overflow/Archival Storage

Some labs also have a subscription archival storage space, which is mounted on the zcluster login node and on the copy nodes as /oflow (note that /oflow is not mounted on the compute nodes). The archival storage system is for long-term storage of large, static datasets.

This filesystem is snapshotted. The snapshots are available only from the mount point under the hidden ".zfs" directory (e.g.: /oflow/jlmlab/.zfs).

Please contact the GACRC staff to request Overflow storage.