Georgia Advanced Computing Resource Center: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
 
(315 intermediate revisions by 11 users not shown)
Line 1: Line 1:
==Clusters==
__NOTOC__
===Overview===
Welcome to the Georgia Advanced Computing Resource Center wiki. The information provided here is a supplement to the GACRC webpage.  The GACRC online information resources include:
===[[rCluster]]===
===[[zCluster]]===
====[[ToDo List]]====
===[[sCluster]]===
====[[scluster todo List]]====
==Paul's List==


* (DONE) root GECOS field edit
*[http://gacrc.uga.edu/ Web Site] – General overview
* (DONE) make /etc/resolv.conf like rcluster (otherwise has bogus 192.168 in there)
*[https://wiki.gacrc.uga.edu/ Wiki] – Software docs and how-to’s - "You Are Here"
* (DONE) "mkdir -p /export/rocks/install/contrib/5.4.3/x86_64/RPMS.all"
*[https://kaltura.uga.edu/channel/GACRC/176125031 Kaltura] – Linux and HPC training videos
* (DONE) add "plugins=1" to "main" section of /etc/yum.conf
<!-- *[https://blog.gacrc.uga.edu/ Blog] – announcements -->
* (DONE) register shead with UGA RHEL satellite
<!-- *[https://forums.gacrc.uga.edu/ Forums] – user discussion area -->
* (DONE) "yum install yum-downloadonly"
* (DONE) "yum update -y --downloadonly"
* (DONE) "yum -y update"; reboot
* (DONE) "mv /var/cache/yum/rhel-x86_64-server-5/packages/*rpm /export/rocks/install/contrib/5.4.3/x86_64/RPMS.all"
* (DONE) copy /etc/cron.daily/rpm.noversion from rcluster; run it by hand once
* (DONE) install these RHEL RPMs: nmap, dejagnu, gpm-devel, screen
* (DONE) "groupadd -g 1001 rccstaff"
* (DONE) "mkdir -p /usr/tools/bin /usr/tools/lib /usr/tools/sbin"
* (DONE) "chmod -R 0750 /usr/tools; chown -R root:rccstaff /usr/tools"
* (DONE) give shead a 172.16 IP address for storage
* (DONE) mount 3070 /usr/local on shead
* (DONE) "cd /usr/local; mkdir -p bin etc games include lib man sbin share src"
* (DONE) "mkdir -p /usr/local/share/info /usr/local/share/doc"
* (DONE) install novi rpm on zhead (make it first, if needed)
* (DONE) add 'storage' network to Rocks
* (DONE) change /etc/profile.d/ssh-key.sh to make it silent
* (DONE) add gx01-04 and all gxdNN entries to /etc/hosts.local
* (DONE) add "-s local" add to /etc/sysconfig/syslog SYSLOGD_OPTIONS
* (DONE) "rocks report host > /etc/hosts"
* (DONE) decide: choose GE distro which has NUMA "hwloc" support.  Chose Son of GE for other reasons.
* (DONE) turn off NFS automounter and disable it
* (DONE) make a root email alias in /etc/aliases for appropriate staff; run "newaliases"
* (DONE) decide: do we want rsh turned on for nodes? no.
* (DONE) decide: install Oracle Java 6 or 7?  6.
* (DONE) decide: make newest kernel for PXE and node installs? apparently not.
* (DONE) what "newaliases" or "postalias" command does postfix need? "newaliases".
* (DONE) remove 'nisplus' from 'automount' and 'alias' /etc/nsswitch.conf lines
* (DONE) service sec-channel stop; chkconfig sec-channel off
* (DONE) service 411 stop; chkconfig 411 off
* (DONE) chkconfig rocks-dmesg off
* (DONE) yum install perl-IO-Zlib tclx-devel
* (DONE) yum remove bluez-libs
* (DONE) add WCOLL, FANOUT and PDSH_RCMD_TYPE variables to pdsh.sh, pdsh.csh


* use novi to populate /export/rocks/install/contrib/5.4.3/x86_64/RPMS from ..../RPMS.all dir
<!--Comments on color for the below -->
* "cd /export/rocks/install; rocks create distro"
<!-- green background = #00CC33 -->
* build and install that GE (turns out it's Son of Grid Engine)
<!-- light orange background = #FF9F40 -->
* tweak /etc/profile, /etc/csh.login, /etc/csh.cshrc ala rcluster
<!-- red background = red -->
* new dot files for skel
<!-- default text, at end of line, is: Online -->
* decide: use GNU stow?  Yes, when it helps.
* build new pdsh which lacks the pdcp bug that version 2.24 has
* make 32 and 64 bit libs match on hn, other nodes
* get most hn-only libs onto nodes and Rocks XML
* get GE init script onto nodes and into Rocks XML
* decide: what partitions, swap for compute nodes? (login too) (2 hrs to implement)
* decide: run memtest86+ on new nodes pre-go-live? (30 mins)
* decide: "modules" package? how otherwise to handle /etc/profile.d and PATH?  ( 4 hrs )
* find or write an "interkill" (6 hrs)
* document how to use /usr/local/src, and how to build and install things (1 hr )
* learn how to extend lifetime of running GE jobs (4 hrs)
* configure BMC/DRACs ( mostly done; 2 hrs? )
* need to put 3070rep somewhere ( .5 hrs )
* mpi-selector in /etc/profile.d ( .5 hrs)
* set up PEs for GE
* decide: do we want the gxdNN IPs in DNS round-robin on headnode? ( 1 hr )
* decide: how to handle e.g. push_users, passwd command on zhead vs. zcluster.rcc (1 hr )
* implement OS backups by cron (.5 hrs)
* check for string "rcluster" in /usr/local/bin/*, /usr/tools/* ( .5 hrs )
* request firewall rules for shead like rcluster has ( 2 hrs )
* compute node: yum install yum-utils


* (POST) decide: how to handle nodes' PXE boot settings and reinstallations (see Rocks User Guide section 6).
<div style="width=100%; margin:0; background:#00CC33; font-size:120%; font-weight:bold; border:1px solid #00CC33; text-align:left; color:white; padding:0.2em 0.4em;"> Current Status: <span style="color:black"> Online </span></div>
* (POST) decide: make a separate Rocks network for IB IP addresses (rack 11)?
* (POST) compute node cron job so node can take self out of queue if problem
* (POST) configure IB on rack11 nodes under RHEL 5, maybe new Rocks appliance type
* (POST) learn how to submit rootly jobs into GE (updates, reboots)
* (POST) cd /export; hg clone http://fyp.rocksclusters.org/hg/rocks-5.4.3
* (much POST) get licenses for PGI compilers and Matlab for zhead


* (NA) decide: continue with pdsh, or use tentakel or "rocks run host"?  It has them all!
<!--
* (NA) try to reduce used space in rcluster:/usr/local (and/or pcluster)
<div style="width=100%; margin:0; background:#FF9F40; font-size:120%; font-weight:bold; border:1px solid #FF9F40; text-align:left; color:white; padding:0.2em 0.4em;"> Current Status: <span style="color:black"> Scheduled maintenance underway - Sapelo2, xfer nodes, GACRC storage systems, and Open OnDemand unavailable </span></div>
* (NA) decide: RHEL 5 ships w/gcc 4.1.2 as default. 4.4 is avail. as tech preview. Do we want it? Yes.
-->
* (NA) build "checkinstall", install into /usr/tools, then make RPM of it and install RPM
<!--
* (NA) "mv /usr/local /usr/local.dist; mkdir /usr/local, chmod 0755 /usr/local"
<div style="width=100%; margin:0; background:#FF9F40; font-size:120%; font-weight:bold; border:1px solid #FF9F40; text-align:left; color:white; padding:0.2em 0.4em;"> Current Status: <span style="color:black"> Teaching cluster inaccessible while the scheduled UGA network maintenance is on-going</span></div>
* (NA) decide: which versions of perl, gcc, python to have as default?
-->
* (NA) installed perl 5.14.1 and did "install Bundle::CPAN" and Bundle::LWP into /usr/local


===[[VMWare]]===
<!--
====[[Virtual Machines]]====
<div style="width=100%; margin:0; background:#00CC33; font-size:120%; font-weight:bold; border:1px solid #00CC33; text-align:left; color:white; padding:0.2em 0.4em;"> Current Status: <span style="color:black"> Sapelo2 Cluster Online </span></div>


==Storage==
<div style="width=100%; margin:0; background:#FF9F40; font-size:120%; font-weight:bold; border:1px solid #FF9F40; text-align:left; color:white; padding:0.2em 0.4em;"> Current Status: <span style="color:black"> Sapelo decommissioned</span></div>
===Overview===
-->
===[[NAS]]===
 
===[[SAN]]===
<div style="width=100%; margin:0; background:#333333; font-size:120%; font-weight:bold; border:1px solid #f9f9f9; text-align:left; color:#eeeeee; padding:0.2em 0.4em;"> IMPORTANT NEWS </div>
==Networking==
The following is an important notice for all of our current users:
===Overview===
 
===[[VLANs]]===
* GACRC offering in-person drop-in '''[[Office Hours]]'''.
===[[IP Networks]]===
 
==Physical Hosts==
<blockquote style="background-color: lightyellow; border: solid thin grey;">
'''October Office Hours:'''
*'''Wednesday October 9th, 3:00-4:30 pm''' at the McBay Science library, Main floor
</blockquote>
 
<div style="width=100%; margin:0; background:#333333; font-size:120%; font-weight:bold; border:1px solid #f9f9f9; text-align:left; color:#eeeeee; padding:0.2em 0.4em;"> Getting Started </div>
Welcome to the Georgia Advanced Computing Resource Center at the University of Georgia. If you're new to the GACRC, start with these links to get acquainted with our resources.
*[[User Accounts]]
*[[Instructional Accounts]]
*[[Connecting]]
*[[Transferring Files]]
*[[Password | Changing your Password]]
*[[Frequently Asked Questions | FAQ]]
*[https://wiki.gacrc.uga.edu/wiki/Quick_Reference_Guide Command List]
*[[Getting Help]]
*[[Policies]]
*[[Consulting]]
*[[Training]]
 
 
<div style="width=100%; margin:0; background:#333333; font-size:120%; font-weight:bold; border:1px solid #f9f9f9; text-align:left; color:#eeeeee; padding:0.2em 0.4em;"> System Information </div>
Hardware information and operational procedures are described below.
*[[Systems]]
*[[Disk Storage]]
<!-- * [[Sapelo2 and Sapelo2 (old) comparison]] -->
 
 
<div style="width=100%; margin:0; background:#333333; font-size:120%; font-weight:bold; border:1px solid #f9f9f9; text-align:left; color:#eeeeee; padding:0.2em 0.4em;"> Job and Data Management </div>
Information on how to run jobs and data management.
*[[Running Jobs]]
*[[Monitoring Jobs]]
*[[Job Submission Partitions]]
*[[Sample Scripts | Sample Job Submission Scripts]]
*[[Migrating from Torque to Slurm]]
*[[Troubleshooting on Sapelo2]]
*[[Best Practices]]
*[[Globus]]
*[[OnDemand | Open OnDemand]]
 
 
<div style="width=100%; margin:0; background:#333333; font-size:120%; font-weight:bold; border:1px solid #f9f9f9; text-align:left; color:#eeeeee; padding:0.2em 0.4em;"> Software and Libraries </div>
Documentation for software applications, programming tools, and usage.
*[[Software]]
*[[Available Toolchains and Toolchain Compatibility]]
*[[Bioinformatics Databases]]
*[[OpenMP]]
*[[MPI | Message Passing Interface (MPI)]]
*[[Compilers]]
*[[GPU|GPU and CUDA Programming]]
*[[Installing Applications]]
 
 
<!--
* [[Galaxy]]
* [[Zaney]]
 
<div style="width=100%; margin:0; background:#eeeeee; font-size:120%; font-weight:bold; border:1px solid #f9f9f9; text-align:left; color:#eeeeee padding:0.2em 0.4em;">
[[GACRC Knowledge Base]]</div>
<br />
<div style="width=100%; margin:0; background:#eeeeee; font-size:120%; font-weight:bold; border:1px solid #f9f9f9; text-align:left; color:#eeeeee padding:0.2em 0.4em;">
[[GACRC Advisory Committee]]</div>
-->

Latest revision as of 12:55, 3 October 2024

Welcome to the Georgia Advanced Computing Resource Center wiki. The information provided here is a supplement to the GACRC webpage. The GACRC online information resources include:

  • Web Site – General overview
  • Wiki – Software docs and how-to’s - "You Are Here"
  • Kaltura – Linux and HPC training videos


Current Status: Online


IMPORTANT NEWS

The following is an important notice for all of our current users:

October Office Hours:

  • Wednesday October 9th, 3:00-4:30 pm at the McBay Science library, Main floor
Getting Started

Welcome to the Georgia Advanced Computing Resource Center at the University of Georgia. If you're new to the GACRC, start with these links to get acquainted with our resources.


System Information

Hardware information and operational procedures are described below.


Job and Data Management

Information on how to run jobs and data management.


Software and Libraries

Documentation for software applications, programming tools, and usage.