Xcluster Go-Live: Difference between revisions
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
	
| m (→PB notes) | |||
| Line 83: | Line 83: | ||
| * '''todo'''': if we want users ssh to nodes for jobs, need  e.g. /etc/profile.d/ssh-key.sh | * '''todo'''': if we want users ssh to nodes for jobs, need  e.g. /etc/profile.d/ssh-key.sh | ||
| * '''todo''':  grep for "nodenumber" in the PDF files to get a list of per-node config file | * '''todo''':  grep for "nodenumber" in the PDF files to get a list of per-node config file | ||
| * '''todo''': build Lustre client.   | * '''todo''': build Lustre client.  Penguins case 62419 about this (we don't have Scyld kernel source). | ||
| ===Documentation=== | ===Documentation=== | ||
| *cloud, login nodes, etc. | *cloud, login nodes, etc. | ||
| *queuing system | *queuing system | ||
Revision as of 14:32, 20 May 2014
Name the cluster!!
- Win a free hot-apple pie
(Scyld) Development Environment
- Paul's VM plus physical nodes
- Intel vs. AMD software builds?
Software
Intel Compiler - do we need it?
Apps to install on xcluster:
- MPI, multi-core, big-memroy, serial?
- Popular apps (regardless of type; how to determine?):
- time stamps on app dirs??
- access time of executable??
 
New queueing system:
- how/what to configure (e.g. fairshare, queues, "complexes")
Use module or not?
Scyld
- Location of /usr/local/apps, libraries, and languages e.g., Perl
- A bunch more stuff needed here
Cloud Manager
Lab Group Registration:
- Lab VM login nodes
User Accounts:
- New request from PI's
- CAS/LDAP authentication?
- affiliate accounts?
Storage
- HPC IceBreaker (3 x 48TB chains)
- one chain for /home, /db, and /usr/local?
- two chains for scratch?
 
- Archive IceBreaker (2 x 320TB chains)
- /oflow and /home, /usr/local backups
 
- Lustre ClusterStor
- scratch for zcluster via 10GBE
- scratch for xcluster via 10GBE, maybe IB later
 
PB notes
- note: don't want perlwrapper this time
- research: will scratch icebreakers run over Ethernet also, or only IB, for storage?
- research: try out POD
- research: what is difference between building on Intel vs AMD?
- decision: which compiler to use by default? I think we think gcc.
- research/decision: will we do "qlogin"?  an interactive queue?  what kind of enforceable resource limits?
- http://www.nics.tennessee.edu/~troy/pbstools/ for a qlogin
- dunno about resource limits yet
 
- decision: paths for /usr/local apps?
- research: can scyldev see zhead license server?
- will know after testing the now-installed PGI
 
- decision: RPMs vs not for /usr/local
- todo:  install lmod (is ready to build)
- want it to heed /opt/scyld/modulefiles as well as other location
 
- todo: install CUDA
- research: can we put Intel compilers on scyldev?
- decision: enable rsh on compute nodes?
- research: figure out node naming scheme. See if we can get the syntax for our customary hostnamse in beowulf config file.
- decision: is 1024 a sufficient max per-user process limit?
- decision: is 1024 a sufficient max per-user open files limit?
- research: need new IP allocation scheme
- research: do any zcluster nodes have distinct kernel command line args?
- todo': if we want users ssh to nodes for jobs, need e.g. /etc/profile.d/ssh-key.sh
- todo: grep for "nodenumber" in the PDF files to get a list of per-node config file
- todo: build Lustre client. Penguins case 62419 about this (we don't have Scyld kernel source).
Documentation
- cloud, login nodes, etc.
- queuing system