Difference between revisions of "Georgia Advanced Computing Resource Center"

From Research Computing Center Wiki
Jump to navigation Jump to search
Line 6: Line 6:
 
===[[sCluster]]===
 
===[[sCluster]]===
 
====[[scluster todo List]]====
 
====[[scluster todo List]]====
 +
==Paul's List==
 +
 +
* (DONE) root GECOS field edit
 +
* (DONE) make /etc/resolv.conf like rcluster (otherwise has bogus 192.168 in there)
 +
* (DONE) "mkdir -p /export/rocks/install/contrib/5.4.3/x86_64/RPMS.all"
 +
* (DONE) add "plugins=1" to "main" section of /etc/yum.conf
 +
* (DONE) register shead with UGA RHEL satellite
 +
* (DONE) "yum install yum-downloadonly"
 +
* (DONE) "yum update -y --downloadonly"
 +
* (DONE) "yum -y update"; reboot
 +
* (DONE) "mv /var/cache/yum/rhel-x86_64-server-5/packages/*rpm /export/rocks/install/contrib/5.4.3/x86_64/RPMS.all"
 +
* (DONE) copy /etc/cron.daily/rpm.noversion from rcluster; run it by hand once
 +
* (DONE) install these RHEL RPMs: nmap, dejagnu, gpm-devel, screen
 +
* (DONE) "groupadd -g 1001 rccstaff"
 +
* (DONE) "mkdir -p /usr/tools/bin /usr/tools/lib /usr/tools/sbin"
 +
* (DONE) "chmod -R 0750 /usr/tools; chown -R root:rccstaff /usr/tools"
 +
* (DONE) give shead a 172.16 IP address for storage
 +
* (DONE) mount 3070 /usr/local on shead
 +
* (DONE) "cd /usr/local; mkdir -p bin etc games include lib man sbin share src"
 +
* (DONE) "mkdir -p /usr/local/share/info /usr/local/share/doc"
 +
* (DONE) install novi rpm on zhead (make it first, if needed)
 +
* (DONE) add 'storage' network to Rocks
 +
* (DONE) change /etc/profile.d/ssh-key.sh to make it silent
 +
* (DONE) add gx01-04 and all gxdNN entries to /etc/hosts.local
 +
* (DONE) add "-s local" add to /etc/sysconfig/syslog SYSLOGD_OPTIONS
 +
* (DONE) "rocks report host > /etc/hosts"
 +
* (DONE) decide: choose GE distro which has NUMA "hwloc" support.  Chose Son of GE for other reasons.
 +
* (DONE) turn off NFS automounter and disable it
 +
* (DONE) make a root email alias in /etc/aliases for appropriate staff; run "newaliases"
 +
* (DONE) decide: do we want rsh turned on for nodes? no.
 +
* (DONE) decide: install Oracle Java 6 or 7?  6.
 +
* (DONE) decide: make newest kernel for PXE and node installs? apparently not.
 +
* (DONE) what "newaliases" or "postalias" command does postfix need? "newaliases".
 +
* (DONE) remove 'nisplus' from 'automount' and 'alias' /etc/nsswitch.conf lines
 +
* (DONE) service sec-channel stop; chkconfig sec-channel off
 +
* (DONE) service 411 stop; chkconfig 411 off
 +
* (DONE) chkconfig rocks-dmesg off
 +
* (DONE) yum install perl-IO-Zlib tclx-devel
 +
* (DONE) yum remove bluez-libs
 +
* (DONE) add WCOLL, FANOUT and PDSH_RCMD_TYPE variables to pdsh.sh, pdsh.csh
 +
 +
* use novi to populate /export/rocks/install/contrib/5.4.3/x86_64/RPMS from ..../RPMS.all dir
 +
* "cd /export/rocks/install; rocks create distro"
 +
* build and install that GE (turns out it's Son of Grid Engine)
 +
* tweak /etc/profile, /etc/csh.login, /etc/csh.cshrc ala rcluster
 +
* new dot files for skel
 +
* decide: use GNU stow?  Yes, when it helps.
 +
* build new pdsh which lacks the pdcp bug that version 2.24 has
 +
* make 32 and 64 bit libs match on hn, other nodes
 +
* get most hn-only libs onto nodes and Rocks XML
 +
* get GE init script onto nodes and into Rocks XML
 +
* decide: what partitions, swap for compute nodes? (login too) (2 hrs to implement)
 +
* decide: run memtest86+ on new nodes pre-go-live? (30 mins)
 +
* decide: "modules" package? how otherwise to handle /etc/profile.d and PATH?  ( 4 hrs )
 +
* find or write an "interkill" (6 hrs)
 +
* document how to use /usr/local/src, and how to build and install things (1 hr )
 +
* learn how to extend lifetime of running GE jobs (4 hrs)
 +
* configure BMC/DRACs ( mostly done; 2 hrs? )
 +
* need to put 3070rep somewhere ( .5 hrs )
 +
* mpi-selector in /etc/profile.d ( .5 hrs)
 +
* set up PEs for GE
 +
* decide: do we want the gxdNN IPs in DNS round-robin on headnode? ( 1 hr )
 +
* decide: how to handle e.g. push_users, passwd command on zhead vs. zcluster.rcc (1 hr )
 +
* implement OS backups by cron (.5 hrs)
 +
* check for string "rcluster" in /usr/local/bin/*, /usr/tools/* ( .5 hrs )
 +
* request firewall rules for shead like rcluster has ( 2 hrs )
 +
* compute node: yum install yum-utils
 +
 +
* (POST) decide: how to handle nodes' PXE boot settings and reinstallations (see Rocks User Guide section 6).
 +
* (POST) decide: make a separate Rocks network for IB IP addresses (rack 11)?
 +
* (POST) compute node cron job so node can take self out of queue if problem
 +
* (POST) configure IB on rack11 nodes under RHEL 5, maybe new Rocks appliance type
 +
* (POST) learn how to submit rootly jobs into GE (updates, reboots)
 +
* (POST) cd /export; hg clone http://fyp.rocksclusters.org/hg/rocks-5.4.3
 +
* (much POST) get licenses for PGI compilers and Matlab for zhead
 +
 +
* (NA) decide: continue with pdsh, or use tentakel or "rocks run host"?  It has them all!
 +
* (NA) try to reduce used space in rcluster:/usr/local (and/or pcluster)
 +
* (NA) decide: RHEL 5 ships w/gcc 4.1.2 as default. 4.4 is avail. as tech preview. Do we want it? Yes.
 +
* (NA) build "checkinstall", install into /usr/tools, then make RPM of it and install RPM
 +
* (NA) "mv /usr/local /usr/local.dist; mkdir /usr/local, chmod 0755 /usr/local"
 +
* (NA) decide: which versions of perl, gcc, python to have as default?
 +
* (NA) installed perl 5.14.1 and did "install Bundle::CPAN" and Bundle::LWP into /usr/local
  
 
===[[VMWare]]===
 
===[[VMWare]]===

Revision as of 12:16, 9 September 2011

Clusters

Overview

rCluster

zCluster

ToDo List

sCluster

scluster todo List

Paul's List

  • (DONE) root GECOS field edit
  • (DONE) make /etc/resolv.conf like rcluster (otherwise has bogus 192.168 in there)
  • (DONE) "mkdir -p /export/rocks/install/contrib/5.4.3/x86_64/RPMS.all"
  • (DONE) add "plugins=1" to "main" section of /etc/yum.conf
  • (DONE) register shead with UGA RHEL satellite
  • (DONE) "yum install yum-downloadonly"
  • (DONE) "yum update -y --downloadonly"
  • (DONE) "yum -y update"; reboot
  • (DONE) "mv /var/cache/yum/rhel-x86_64-server-5/packages/*rpm /export/rocks/install/contrib/5.4.3/x86_64/RPMS.all"
  • (DONE) copy /etc/cron.daily/rpm.noversion from rcluster; run it by hand once
  • (DONE) install these RHEL RPMs: nmap, dejagnu, gpm-devel, screen
  • (DONE) "groupadd -g 1001 rccstaff"
  • (DONE) "mkdir -p /usr/tools/bin /usr/tools/lib /usr/tools/sbin"
  • (DONE) "chmod -R 0750 /usr/tools; chown -R root:rccstaff /usr/tools"
  • (DONE) give shead a 172.16 IP address for storage
  • (DONE) mount 3070 /usr/local on shead
  • (DONE) "cd /usr/local; mkdir -p bin etc games include lib man sbin share src"
  • (DONE) "mkdir -p /usr/local/share/info /usr/local/share/doc"
  • (DONE) install novi rpm on zhead (make it first, if needed)
  • (DONE) add 'storage' network to Rocks
  • (DONE) change /etc/profile.d/ssh-key.sh to make it silent
  • (DONE) add gx01-04 and all gxdNN entries to /etc/hosts.local
  • (DONE) add "-s local" add to /etc/sysconfig/syslog SYSLOGD_OPTIONS
  • (DONE) "rocks report host > /etc/hosts"
  • (DONE) decide: choose GE distro which has NUMA "hwloc" support. Chose Son of GE for other reasons.
  • (DONE) turn off NFS automounter and disable it
  • (DONE) make a root email alias in /etc/aliases for appropriate staff; run "newaliases"
  • (DONE) decide: do we want rsh turned on for nodes? no.
  • (DONE) decide: install Oracle Java 6 or 7? 6.
  • (DONE) decide: make newest kernel for PXE and node installs? apparently not.
  • (DONE) what "newaliases" or "postalias" command does postfix need? "newaliases".
  • (DONE) remove 'nisplus' from 'automount' and 'alias' /etc/nsswitch.conf lines
  • (DONE) service sec-channel stop; chkconfig sec-channel off
  • (DONE) service 411 stop; chkconfig 411 off
  • (DONE) chkconfig rocks-dmesg off
  • (DONE) yum install perl-IO-Zlib tclx-devel
  • (DONE) yum remove bluez-libs
  • (DONE) add WCOLL, FANOUT and PDSH_RCMD_TYPE variables to pdsh.sh, pdsh.csh
  • use novi to populate /export/rocks/install/contrib/5.4.3/x86_64/RPMS from ..../RPMS.all dir
  • "cd /export/rocks/install; rocks create distro"
  • build and install that GE (turns out it's Son of Grid Engine)
  • tweak /etc/profile, /etc/csh.login, /etc/csh.cshrc ala rcluster
  • new dot files for skel
  • decide: use GNU stow? Yes, when it helps.
  • build new pdsh which lacks the pdcp bug that version 2.24 has
  • make 32 and 64 bit libs match on hn, other nodes
  • get most hn-only libs onto nodes and Rocks XML
  • get GE init script onto nodes and into Rocks XML
  • decide: what partitions, swap for compute nodes? (login too) (2 hrs to implement)
  • decide: run memtest86+ on new nodes pre-go-live? (30 mins)
  • decide: "modules" package? how otherwise to handle /etc/profile.d and PATH? ( 4 hrs )
  • find or write an "interkill" (6 hrs)
  • document how to use /usr/local/src, and how to build and install things (1 hr )
  • learn how to extend lifetime of running GE jobs (4 hrs)
  • configure BMC/DRACs ( mostly done; 2 hrs? )
  • need to put 3070rep somewhere ( .5 hrs )
  • mpi-selector in /etc/profile.d ( .5 hrs)
  • set up PEs for GE
  • decide: do we want the gxdNN IPs in DNS round-robin on headnode? ( 1 hr )
  • decide: how to handle e.g. push_users, passwd command on zhead vs. zcluster.rcc (1 hr )
  • implement OS backups by cron (.5 hrs)
  • check for string "rcluster" in /usr/local/bin/*, /usr/tools/* ( .5 hrs )
  • request firewall rules for shead like rcluster has ( 2 hrs )
  • compute node: yum install yum-utils
  • (POST) decide: how to handle nodes' PXE boot settings and reinstallations (see Rocks User Guide section 6).
  • (POST) decide: make a separate Rocks network for IB IP addresses (rack 11)?
  • (POST) compute node cron job so node can take self out of queue if problem
  • (POST) configure IB on rack11 nodes under RHEL 5, maybe new Rocks appliance type
  • (POST) learn how to submit rootly jobs into GE (updates, reboots)
  • (POST) cd /export; hg clone http://fyp.rocksclusters.org/hg/rocks-5.4.3
  • (much POST) get licenses for PGI compilers and Matlab for zhead
  • (NA) decide: continue with pdsh, or use tentakel or "rocks run host"? It has them all!
  • (NA) try to reduce used space in rcluster:/usr/local (and/or pcluster)
  • (NA) decide: RHEL 5 ships w/gcc 4.1.2 as default. 4.4 is avail. as tech preview. Do we want it? Yes.
  • (NA) build "checkinstall", install into /usr/tools, then make RPM of it and install RPM
  • (NA) "mv /usr/local /usr/local.dist; mkdir /usr/local, chmod 0755 /usr/local"
  • (NA) decide: which versions of perl, gcc, python to have as default?
  • (NA) installed perl 5.14.1 and did "install Bundle::CPAN" and Bundle::LWP into /usr/local

VMWare

Virtual Machines

Storage

Overview

NAS

SAN

Networking

Overview

VLANs

IP Networks

Physical Hosts