Systems: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
 
(28 intermediate revisions by 4 users not shown)
Line 30: Line 30:
===  Sapelo2 ===
===  Sapelo2 ===


Sapelo2 is a Linux cluster that runs a 64-bit CentOS 7.5 operating system and it is managed using Foreman and Puppet. Two physical login nodes are available, with Intel Xeon E5-2680 v3 (Haswell) processors and 128GB of RAM and 24 cores per node.  
Sapelo2 is a Linux cluster that runs a 64-bit Rocky 8.8 operating system and it is managed using Warewulf. Several virtual login nodes are available, with Intel Xeon Gold 6230 processors, 32GB of RAM, and 16 cores per node. The queueing system on Sapelo2 is Slurm.


For a subset of compute nodes, internodal communication among them and between these nodes and the storage systems serving the home directories and the scratch directories is provided by a QDR Infiniband network(40Gbps). For another subset of compute nodes, these communications are provided by an EDR Infiniband network.
Internodal communication among the compute nodes and between these nodes and the storage systems serving the home directories and the scratch directories is provided by an EDR Infiniband network (100Gbps).




Line 39: Line 39:
'''Regular nodes'''
'''Regular nodes'''


*106 compute nodes with AMD Opteron processors (48 cores and 128GB of RAM per node)  
* 14 compute nodes with AMD EPYC (Genoa 4th gen) processors (128 cores and 745GB of RAM per node)
* 22 compute nodes with AMD EPYC (Rome) processors (64 cores and 128GB of RAM per node)
* 120 compute nodes with AMD EPYC (Milan 3rd gen) processors (128 cores and 512GB of RAM per node)
* 16 compute nodes with AMD EPYC processors (32 cores and 128GB of RAM per node)
* 4 compute nodes with AMD EPYC (Milan 3rd gen) processors (64 cores and 256GB of RAM per node)
* 2 compute nodes with AMD EPYC (Milan 3rd gen) processors (64 cores and 128GB of RAM per node)
* 123 compute nodes with AMD EPYC (Rome 2nd gen) processors (64 cores and 128GB of RAM per node)
* 50 compute nodes with AMD EPYC (Naples 1st gen) processors (32 cores and 128GB of RAM per node)
* 42 compute nodes with Intel Xeon Skylake processors (32 cores and 192GB of RAM per node)
* 42 compute nodes with Intel Xeon Skylake processors (32 cores and 192GB of RAM per node)
* 32 compute nodes with Intel Xeon Broadwell processors (28 cores and 64GB of RAM per node)
 
4 compute nodes with AMD Opteron processors (48 cores and 256GB of RAM per node)
 
'''High memory nodes (3TB/node)'''
 
* 3 compute nodes with AMD EPYC (Genoa 4th gen) processors (48 cores and 3TB of RAM per node)
 
 
'''High memory nodes (2TB/node)'''
 
2 compute nodes with AMD EPYC (Rome 2nd gen) processors (32 cores and 2TB of RAM per node)




'''High memory nodes (1TB/node)'''
'''High memory nodes (1TB/node)'''


* 4 compute nodes with AMD EPYC processors (64 cores and 1TB of RAM per node)
* 2 compute nodes with AMD EPYC (Milan 3rd gen) processors (128 cores and 1TB of RAM per node)
* 4 compute nodes with Intel Xeon Broadwell processors (28 cores and 1TB of RAM per node)
* 12 compute nodes with AMD EPYC (Milan 3rd gen) processors (32 cores and 1TB of RAM per node)
* 1 compute node with AMD Opteron processors (48 cores and 1TB of RAM per node)
* 2 compute nodes with AMD EPYC (Naples 1st gen) processors (64 cores and 1TB of RAM per node)
* 1 compute nodes with Intel Xeon Broadwell processors (28 cores and 1TB of RAM per node)




'''High memory nodes (512GB/node)'''
'''High memory nodes (512GB/node)'''


* 16 compute nodes with AMD EPYC processors (32 cores and 512GB of RAM per node)
* 18 compute nodes with AMD EPYC (Naples 1st gen) processors (32 cores and 512GB of RAM per node)
*  6 compute nodes with AMD Opteron processors (48 cores and 512GB of RAM per node)
<!-- *  1 compute node with Intel Xeon Nehalem processors (32 cores and 512GB of RAM per node) -->
<!-- *  1 compute node with Intel Xeon Nehalem processors (32 cores and 512GB of RAM per node) -->


Line 63: Line 74:
'''GPU nodes'''
'''GPU nodes'''


* 4 compute nodes with Intel Xeon Skylake processors (32 cores and 187GB of RAM) and 1 NVIDIA P100 GPU card per node
* 12 compute nodes with Intel Xeon SapphireRapids processors (64 cores and 1TB of RAM) and 4x NVIDIA H100 GPU cards.
* 2 compute nodes with Intel Xeon processors (16 cores and 128GB of RAM) and 8 NVIDIA K40m GPU cards per node  
* 12 compute nodes with AMD EPYC (Genoa 4th gen) processors (128 cores and 745GB of RAM) and 4x NVIDIA L4 GPU cards.
* 4 compute nodes with Intel Xeon processors (12 cores and 96GB of RAM) and 7 NVIDIA K20Xm GPU cards per node  
* 14 compute nodes with AMD EPYC (Milan 3rd gen) processors (64 cores and 1TB of RAM) and 4x NVIDIA A100 GPU cards.
* 2 compute nodes with Intel Xeon Skylake processors (32 cores and 187GB of RAM) and 1x NVIDIA P100 GPU card per node
<!-- * 2 compute nodes with Intel Xeon processors (16 cores and 128GB of RAM) and 8x NVIDIA K40m GPU cards per node -->




Line 72: Line 85:
* Various configurations
* Various configurations


The queueing system on Sapelo2 is Torque/Moab.


<!--
<!--
Line 95: Line 106:
====[[Disk Storage]]====
====[[Disk Storage]]====


====[[Software Installed on Sapelo2]]====
====[[Software on Sapelo2]]====
 
====[[Available Toolchains and Toolchain Compatibility]]====


====[[Code Compilation on Sapelo2]]====
====[[Code Compilation on Sapelo2]]====
Line 102: Line 115:


====[[Monitoring Jobs on Sapelo2]]====
====[[Monitoring Jobs on Sapelo2]]====
====[[Migrating from Torque to Slurm]]====
'''Training material'''
To help users familiarize with Slurm and the test cluster environment, we have prepared some training videos that are available from the GACRC's Kaltura channel at https://kaltura.uga.edu/channel/GACRC/176125031 (login with MyID and password is required). Training sessions and slides are available at https://wiki.gacrc.uga.edu/wiki/Training




Line 107: Line 128:
[[#top|Back to Top]]
[[#top|Back to Top]]


 
<!--
===  Slurm Test Cluster (Sap2test) ===
===  Slurm Test Cluster (Sap2test) ===


Line 159: Line 180:
====[[Software on sap2test | Software Installed on the Slurm test cluster]]====
====[[Software on sap2test | Software Installed on the Slurm test cluster]]====


====[[Code Compilation on the Slurm test cluster]]====
====[[Code Compilation on Sap2test]]====


====[[Available Toolchains and Toolchain Compatibility]]====
====[[Available Toolchains and Toolchain Compatibility]]====
Line 175: Line 196:
----
----
[[#top|Back to Top]]
[[#top|Back to Top]]
-->


===  Teaching cluster ===
===  Teaching cluster ===


The teaching cluster is a Linux cluster that runs a 64-bit Linux, with Centos 7.8. The login node is a VM that has 4 cores (Intel Xeon Gold 6230 processor) and 16GB of RAM. An Ethernet network (1Gbps) provides internodal communication among compute nodes, and between the compute nodes and the storage systems serving the home directories and the work directories.
The teaching cluster is a Linux cluster that runs a 64-bit Linux, with Rocky 8.8. The login node is a VM that has 4 cores (Intel Xeon Gold 6230 processor) and 16GB of RAM. An EDR Infiniband network (100Gbps) provides internodal communication among compute nodes, and between the compute nodes and the storage systems serving the home directories and the work directories.


The cluster is currently comprised of the following resources:  
The cluster is currently comprised of the following resources:  


'''Regular nodes:'''
* 10 compute nodes with AMD EPYC (Naples 1st gen) processors (32 cores and 128GB or RAM per node)
'''High-memory nodes:'''
* 2 compute nodes with AMD EPYC (Naples 1st gen) processors (64 cores and 1TB of RAM per node)
'''GPU nodes:'''
* 1 compute node with Intel Skylake processors (32 cores, 192GB RAM per node) and a P100 GPU card
<!--
*30 compute nodes with Intel Xeon X5650 2.67GHz processors (12 cores and 48GB of RAM per node)  
*30 compute nodes with Intel Xeon X5650 2.67GHz processors (12 cores and 48GB of RAM per node)  
* 2 compute nodes with Intel Xeon L7555 1.87GHz processors (32 cores and 512GB of RAM per node)
* 2 compute nodes with Intel Xeon L7555 1.87GHz processors (32 cores and 512GB of RAM per node)
* 4 NVIDIA Tesla (Kepler) K20Xm GPU cards. These cards are installed on one host that has dual 6-core Intel Xeon CPUs and 48GB of RAM
* 4 NVIDIA Tesla (Kepler) K20Xm GPU cards. These cards are installed on one host that has dual 6-core Intel Xeon CPUs and 48GB of RAM
-->


The queueing system on the teaching cluster is Slurm.
The queueing system on the teaching cluster is Slurm.
Line 190: Line 226:
====[[Connecting#Connecting_to_the_teaching_cluster |Connecting to the teaching cluster]]====
====[[Connecting#Connecting_to_the_teaching_cluster |Connecting to the teaching cluster]]====


====[[Transferring Files]]====


====[[Transferring Files]]==== 
<!--
====[[Disk Storage]]====
====[[Disk Storage]]====
 
-->
====Software Installed on the teaching cluster====
====Software Installed on the teaching cluster====


The list of installed application is available at [[Software]] page.
The teaching cluster has access to the same software stack installed on Sapelo2.


====[[Code Compilation on the teaching cluster]]====
====[[Code Compilation on the teaching cluster]]====

Latest revision as of 09:26, 12 September 2024



Sapelo2

Sapelo2 is a Linux cluster that runs a 64-bit Rocky 8.8 operating system and it is managed using Warewulf. Several virtual login nodes are available, with Intel Xeon Gold 6230 processors, 32GB of RAM, and 16 cores per node. The queueing system on Sapelo2 is Slurm.

Internodal communication among the compute nodes and between these nodes and the storage systems serving the home directories and the scratch directories is provided by an EDR Infiniband network (100Gbps).


The cluster is currently comprised of the following resources:

Regular nodes

  • 14 compute nodes with AMD EPYC (Genoa 4th gen) processors (128 cores and 745GB of RAM per node)
  • 120 compute nodes with AMD EPYC (Milan 3rd gen) processors (128 cores and 512GB of RAM per node)
  • 4 compute nodes with AMD EPYC (Milan 3rd gen) processors (64 cores and 256GB of RAM per node)
  • 2 compute nodes with AMD EPYC (Milan 3rd gen) processors (64 cores and 128GB of RAM per node)
  • 123 compute nodes with AMD EPYC (Rome 2nd gen) processors (64 cores and 128GB of RAM per node)
  • 50 compute nodes with AMD EPYC (Naples 1st gen) processors (32 cores and 128GB of RAM per node)
  • 42 compute nodes with Intel Xeon Skylake processors (32 cores and 192GB of RAM per node)


High memory nodes (3TB/node)

  • 3 compute nodes with AMD EPYC (Genoa 4th gen) processors (48 cores and 3TB of RAM per node)


High memory nodes (2TB/node)

  • 2 compute nodes with AMD EPYC (Rome 2nd gen) processors (32 cores and 2TB of RAM per node)


High memory nodes (1TB/node)

  • 2 compute nodes with AMD EPYC (Milan 3rd gen) processors (128 cores and 1TB of RAM per node)
  • 12 compute nodes with AMD EPYC (Milan 3rd gen) processors (32 cores and 1TB of RAM per node)
  • 2 compute nodes with AMD EPYC (Naples 1st gen) processors (64 cores and 1TB of RAM per node)
  • 1 compute nodes with Intel Xeon Broadwell processors (28 cores and 1TB of RAM per node)


High memory nodes (512GB/node)

  • 18 compute nodes with AMD EPYC (Naples 1st gen) processors (32 cores and 512GB of RAM per node)


GPU nodes

  • 12 compute nodes with Intel Xeon SapphireRapids processors (64 cores and 1TB of RAM) and 4x NVIDIA H100 GPU cards.
  • 12 compute nodes with AMD EPYC (Genoa 4th gen) processors (128 cores and 745GB of RAM) and 4x NVIDIA L4 GPU cards.
  • 14 compute nodes with AMD EPYC (Milan 3rd gen) processors (64 cores and 1TB of RAM) and 4x NVIDIA A100 GPU cards.
  • 2 compute nodes with Intel Xeon Skylake processors (32 cores and 187GB of RAM) and 1x NVIDIA P100 GPU card per node


Buy-in nodes

  • Various configurations


Connecting to Sapelo2

Transferring Files

Disk Storage

Software on Sapelo2

Available Toolchains and Toolchain Compatibility

Code Compilation on Sapelo2

Running Jobs on Sapelo2

Monitoring Jobs on Sapelo2

Migrating from Torque to Slurm

Training material

To help users familiarize with Slurm and the test cluster environment, we have prepared some training videos that are available from the GACRC's Kaltura channel at https://kaltura.uga.edu/channel/GACRC/176125031 (login with MyID and password is required). Training sessions and slides are available at https://wiki.gacrc.uga.edu/wiki/Training



Back to Top


Teaching cluster

The teaching cluster is a Linux cluster that runs a 64-bit Linux, with Rocky 8.8. The login node is a VM that has 4 cores (Intel Xeon Gold 6230 processor) and 16GB of RAM. An EDR Infiniband network (100Gbps) provides internodal communication among compute nodes, and between the compute nodes and the storage systems serving the home directories and the work directories.

The cluster is currently comprised of the following resources:

Regular nodes:

  • 10 compute nodes with AMD EPYC (Naples 1st gen) processors (32 cores and 128GB or RAM per node)

High-memory nodes:

  • 2 compute nodes with AMD EPYC (Naples 1st gen) processors (64 cores and 1TB of RAM per node)

GPU nodes:

  • 1 compute node with Intel Skylake processors (32 cores, 192GB RAM per node) and a P100 GPU card

The queueing system on the teaching cluster is Slurm.

Connecting to the teaching cluster

Transferring Files

Software Installed on the teaching cluster

The teaching cluster has access to the same software stack installed on Sapelo2.

Code Compilation on the teaching cluster

Running Jobs on the teaching cluster

Monitoring Jobs on the teaching cluster