Georgia Advanced Computing Resource Center: Difference between revisions
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
	
| Line 6: | Line 6: | ||
| ===[[sCluster]]=== | ===[[sCluster]]=== | ||
| ====[[scluster todo List]]==== | ====[[scluster todo List]]==== | ||
| ==Paul's List== | |||
| * (DONE) root GECOS field edit | |||
| * (DONE) make /etc/resolv.conf like rcluster (otherwise has bogus 192.168 in there) | |||
| * (DONE) "mkdir -p /export/rocks/install/contrib/5.4.3/x86_64/RPMS.all" | |||
| * (DONE) add "plugins=1" to "main" section of /etc/yum.conf | |||
| * (DONE) register zhead with UGA RHEL satellite | |||
| * (DONE) "yum install yum-downloadonly" | |||
| * (DONE) "yum update -y --downloadonly" | |||
| * (DONE) "yum -y update" | |||
| * (DONE) reboot | |||
| * (DONE) "mv /var/cache/yum/rhel-x86_64-server-5/packages/*rpm /export/rocks/install/contrib/5.4.3/x86_64/RPMS.all" | |||
| * (DONE) copy /etc/cron.daily/rpm.noversion from rcluster; run it by hand once | |||
| * (DONE) install these RHEL RPMs: nmap, dejagnu, gpm-devel, screen | |||
| * (DONE) cd /export; hg clone http://fyp.rocksclusters.org/hg/rocks-5.4.3 | |||
| * (DONE) build "checkinstall", install into /usr/tools, then make RPM of it and install RPM | |||
| * (DONE) "groupadd -g 1001 rccstaff" | |||
| * (DONE) "mkdir -p /usr/tools/bin /usr/tools/lib /usr/tools/sbin" | |||
| * (DONE) "chmod -R 0750 /usr/tools; chown -R root:rccstaff /usr/tools" | |||
| * (DONE) "mv /usr/local /usr/local.dist; mkdir /usr/local, chmod 0755 /usr/local" | |||
| * (DONE) give zhead a 172.16 IP address for storage | |||
| * (DONE) mount 3070 /usr/local on zhead | |||
| * (DONE) "cd /usr/local; mkdir -p bin etc games include lib man sbin share src" | |||
| * (DONE) "mkdir -p /usr/local/share/info /usr/local/share/doc" | |||
| * (DONE) install novi rpm on zhead (make it first, if needed) | |||
| * (DONE) use novi to populate /export/rocks/install/contrib/5.4.3/x86_64/RPMS from ..../RPMS.all dir | |||
| * (DONE) "cd /export/rocks/install; rocks create distro" | |||
| * (DONE) add 'storage' network to Rocks | |||
| * (DONE) change /etc/profile.d/ssh-key.sh to make it silent | |||
| * (DONE) add gx01-04, rcluster.local, thumpers, and all gxdNN entries to /etc/hosts.local | |||
| * (DONE) add "-s local" add to /etc/sysconfig/syslog SYSLOGD_OPTIONS | |||
| * (DONE) decide: RHEL 5 ships w/gcc 4.1.2 as default. 4.4 is avail. as tech preview. Do we want it? Yes. | |||
| * (DONE) "rocks report host > /etc/hosts" | |||
| * (DONE) decide: which versions of perl, gcc, python to have as default? | |||
| * (DONE) installed perl 5.14.1 and did "install Bundle::CPAN" and Bundle::LWP into /usr/local | |||
| * (DONE) decide: choose GE distro which has NUMA "hwloc" support.  Chose Son of GE for other reasons. | |||
| * (DONE) build and install that GE (turns out it's Son of Grid Engine) | |||
| * (DONE) turn off NFS automounter and disable it | |||
| * (DONE) make a root email alias in /etc/aliases for appropriate staff; run "newaliases" | |||
| * (DONE) decide: do we want rsh turned on for nodes? no. | |||
| * (DONE) decide: install Oracle Java 6 or 7?  6. | |||
| * (DONE) decide: make newest kernel for PXE and node installs? apparently not. | |||
| * (DONE) what "newaliases" or "postalias" command does postfix need? "newaliases". | |||
| * (DONE) tweak /etc/profile, /etc/csh.login, /etc/csh.cshrc ala rcluster | |||
| * (DONE) new dot files for skel | |||
| * (DONE) decide: use GNU stow?  Yes, when it helps. | |||
| * (DONE) decide: continue with pdsh, or use tentakel or "rocks run host"?  It has them all! | |||
| * (DONE) try to reduce used space in rcluster:/usr/local (and/or pcluster) | |||
| * (DONE) remove 'nisplus' from 'automount' and 'alias' /etc/nsswitch.conf lines | |||
| * (DONE) service sec-channel stop; chkconfig sec-channel off | |||
| * (DONE) service 411 stop; chkconfig 411 off | |||
| * (DONE) chkconfig rocks-dmesg off | |||
| * (DONE) yum install perl-IO-Zlib; yum install tclx-devel | |||
| * (DONE) yum remove bluez-libs | |||
| * (DONE) add WCOLL, FANOUT and PDSH_RCMD_TYPE variables to pdsh.sh, pdsh.csh | |||
| * build new pdsh which lacks the pdcp bug that version 2.24 has | |||
| * make 32 and 64 bit libs match on hn, other nodes | |||
| * get most hn-only libs onto nodes and Rocks XML | |||
| * get GE init script onto nodes and into Rocks XML | |||
| * decide: what partitions, swap for compute nodes? (login too) (2 hrs to implement) | |||
| * decide: run memtest86+ on new nodes pre-go-live? (30 mins) | |||
| * decide: "modules" package? how otherwise to handle /etc/profile.d and PATH?  ( 4 hrs ) | |||
| * find or write an "interkill" (6 hrs) | |||
| * document how to use /usr/local/src, and how to build and install things (1 hr ) | |||
| * learn how to extend lifetime of running GE jobs (4 hrs) | |||
| * configure BMC/DRACs ( mostly done; 2 hrs? ) | |||
| * need to put 3070rep somewhere ( .5 hrs ) | |||
| * mpi-selector in /etc/profile.d ( .5 hrs)  | |||
| * set up PEs for GE | |||
| * decide: do we want the gxdNN IPs in DNS round-robin on headnode? ( 1 hr )  | |||
| * decide: how to handle e.g. push_users, passwd command on zhead vs. zcluster.rcc (1 hr ) | |||
| * implement OS backups by cron (.5 hrs)  | |||
| * check for string "rcluster" in /usr/local/bin/*, /usr/tools/* ( .5 hrs ) | |||
| * request firewall rules for zcluster, zhead, like rcluster has ( 2 hrs ) | |||
| * (POST) decide: how to handle nodes' PXE boot settings and reinstallations (see Rocks User Guide section 6). | |||
| * (POST) decide: make a separate Rocks network for IB IP addresses (rack 11)? | |||
| * (POST) compute node cron job so node can take self out of queue if problem | |||
| * (POST) configure IB on rack11 nodes under RHEL 5, maybe new Rocks appliance type | |||
| * (POST) learn how to submit rootly jobs into GE (updates, reboots) | |||
| * (much POST) get licenses for PGI compilers and Matlab for zhead | |||
| * compute node: yum install yum-utils | |||
| ===[[VMWare]]=== | ===[[VMWare]]=== | ||
| ====[[Virtual Machines]]==== | ====[[Virtual Machines]]==== | ||
Revision as of 11:36, 9 September 2011
Clusters
Overview
rCluster
zCluster
ToDo List
sCluster
scluster todo List
Paul's List
- (DONE) root GECOS field edit
- (DONE) make /etc/resolv.conf like rcluster (otherwise has bogus 192.168 in there)
- (DONE) "mkdir -p /export/rocks/install/contrib/5.4.3/x86_64/RPMS.all"
- (DONE) add "plugins=1" to "main" section of /etc/yum.conf
- (DONE) register zhead with UGA RHEL satellite
- (DONE) "yum install yum-downloadonly"
- (DONE) "yum update -y --downloadonly"
- (DONE) "yum -y update"
- (DONE) reboot
- (DONE) "mv /var/cache/yum/rhel-x86_64-server-5/packages/*rpm /export/rocks/install/contrib/5.4.3/x86_64/RPMS.all"
- (DONE) copy /etc/cron.daily/rpm.noversion from rcluster; run it by hand once
- (DONE) install these RHEL RPMs: nmap, dejagnu, gpm-devel, screen
- (DONE) cd /export; hg clone http://fyp.rocksclusters.org/hg/rocks-5.4.3
- (DONE) build "checkinstall", install into /usr/tools, then make RPM of it and install RPM
- (DONE) "groupadd -g 1001 rccstaff"
- (DONE) "mkdir -p /usr/tools/bin /usr/tools/lib /usr/tools/sbin"
- (DONE) "chmod -R 0750 /usr/tools; chown -R root:rccstaff /usr/tools"
- (DONE) "mv /usr/local /usr/local.dist; mkdir /usr/local, chmod 0755 /usr/local"
- (DONE) give zhead a 172.16 IP address for storage
- (DONE) mount 3070 /usr/local on zhead
- (DONE) "cd /usr/local; mkdir -p bin etc games include lib man sbin share src"
- (DONE) "mkdir -p /usr/local/share/info /usr/local/share/doc"
- (DONE) install novi rpm on zhead (make it first, if needed)
- (DONE) use novi to populate /export/rocks/install/contrib/5.4.3/x86_64/RPMS from ..../RPMS.all dir
- (DONE) "cd /export/rocks/install; rocks create distro"
- (DONE) add 'storage' network to Rocks
- (DONE) change /etc/profile.d/ssh-key.sh to make it silent
- (DONE) add gx01-04, rcluster.local, thumpers, and all gxdNN entries to /etc/hosts.local
- (DONE) add "-s local" add to /etc/sysconfig/syslog SYSLOGD_OPTIONS
- (DONE) decide: RHEL 5 ships w/gcc 4.1.2 as default. 4.4 is avail. as tech preview. Do we want it? Yes.
- (DONE) "rocks report host > /etc/hosts"
- (DONE) decide: which versions of perl, gcc, python to have as default?
- (DONE) installed perl 5.14.1 and did "install Bundle::CPAN" and Bundle::LWP into /usr/local
- (DONE) decide: choose GE distro which has NUMA "hwloc" support. Chose Son of GE for other reasons.
- (DONE) build and install that GE (turns out it's Son of Grid Engine)
- (DONE) turn off NFS automounter and disable it
- (DONE) make a root email alias in /etc/aliases for appropriate staff; run "newaliases"
- (DONE) decide: do we want rsh turned on for nodes? no.
- (DONE) decide: install Oracle Java 6 or 7? 6.
- (DONE) decide: make newest kernel for PXE and node installs? apparently not.
- (DONE) what "newaliases" or "postalias" command does postfix need? "newaliases".
- (DONE) tweak /etc/profile, /etc/csh.login, /etc/csh.cshrc ala rcluster
- (DONE) new dot files for skel
- (DONE) decide: use GNU stow? Yes, when it helps.
- (DONE) decide: continue with pdsh, or use tentakel or "rocks run host"? It has them all!
- (DONE) try to reduce used space in rcluster:/usr/local (and/or pcluster)
- (DONE) remove 'nisplus' from 'automount' and 'alias' /etc/nsswitch.conf lines
- (DONE) service sec-channel stop; chkconfig sec-channel off
- (DONE) service 411 stop; chkconfig 411 off
- (DONE) chkconfig rocks-dmesg off
- (DONE) yum install perl-IO-Zlib; yum install tclx-devel
- (DONE) yum remove bluez-libs
- (DONE) add WCOLL, FANOUT and PDSH_RCMD_TYPE variables to pdsh.sh, pdsh.csh
- build new pdsh which lacks the pdcp bug that version 2.24 has
- make 32 and 64 bit libs match on hn, other nodes
- get most hn-only libs onto nodes and Rocks XML
- get GE init script onto nodes and into Rocks XML
- decide: what partitions, swap for compute nodes? (login too) (2 hrs to implement)
- decide: run memtest86+ on new nodes pre-go-live? (30 mins)
- decide: "modules" package? how otherwise to handle /etc/profile.d and PATH? ( 4 hrs )
- find or write an "interkill" (6 hrs)
- document how to use /usr/local/src, and how to build and install things (1 hr )
- learn how to extend lifetime of running GE jobs (4 hrs)
- configure BMC/DRACs ( mostly done; 2 hrs? )
- need to put 3070rep somewhere ( .5 hrs )
- mpi-selector in /etc/profile.d ( .5 hrs)
- set up PEs for GE
- decide: do we want the gxdNN IPs in DNS round-robin on headnode? ( 1 hr )
- decide: how to handle e.g. push_users, passwd command on zhead vs. zcluster.rcc (1 hr )
- implement OS backups by cron (.5 hrs)
- check for string "rcluster" in /usr/local/bin/*, /usr/tools/* ( .5 hrs )
- request firewall rules for zcluster, zhead, like rcluster has ( 2 hrs )
- (POST) decide: how to handle nodes' PXE boot settings and reinstallations (see Rocks User Guide section 6).
- (POST) decide: make a separate Rocks network for IB IP addresses (rack 11)?
- (POST) compute node cron job so node can take self out of queue if problem
- (POST) configure IB on rack11 nodes under RHEL 5, maybe new Rocks appliance type
- (POST) learn how to submit rootly jobs into GE (updates, reboots)
- (much POST) get licenses for PGI compilers and Matlab for zhead
- compute node: yum install yum-utils