Xcluster Go-Live: Difference between revisions
Jump to navigation
Jump to search
m (→PB notes) |
m (→PB notes) |
||
Line 68: | Line 68: | ||
** dunno about resource limits yet | ** dunno about resource limits yet | ||
* '''decision''': paths for /usr/local apps? | * '''decision''': paths for /usr/local apps? | ||
* ''' | * '''RESOLVED''': scyldev can see zhead license server | ||
* | * '''DONE''': install PGI | ||
* '''decision''': RPMs vs not for /usr/local | * '''decision''': RPMs vs not for /usr/local | ||
* '''todo''': install lmod (is ready to build) | * '''todo''': install lmod (is ready to build) |
Revision as of 14:19, 20 May 2014
Name the cluster!!
- Win a free hot-apple pie
(Scyld) Development Environment
- Paul's VM plus physical nodes
- Intel vs. AMD software builds?
Software
Intel Compiler - do we need it?
Apps to install on xcluster:
- MPI, multi-core, big-memroy, serial?
- Popular apps (regardless of type; how to determine?):
- time stamps on app dirs??
- access time of executable??
New queueing system:
- how/what to configure (e.g. fairshare, queues, "complexes")
Use module or not?
Scyld
- Location of /usr/local/apps, libraries, and languages e.g., Perl
- A bunch more stuff needed here
Cloud Manager
Lab Group Registration:
- Lab VM login nodes
User Accounts:
- New request from PI's
- CAS/LDAP authentication?
- affiliate accounts?
Storage
- HPC IceBreaker (3 x 48TB chains)
- one chain for /home, /db, and /usr/local?
- two chains for scratch?
- Archive IceBreaker (2 x 320TB chains)
- /oflow and /home, /usr/local backups
- Lustre ClusterStor
- scratch for zcluster via 10GBE
- scratch for xcluster via 10GBE, maybe IB later
PB notes
- note: don't want perlwrapper this time
- research: will scratch icebreakers run over Ethernet also, or only IB, for storage?
- research: try out POD
- research: what is difference between building on Intel vs AMD?
- decision: which compiler to use by default? I think we think gcc.
- research/decision: will we do "qlogin"? an interactive queue? what kind of enforceable resource limits?
- http://www.nics.tennessee.edu/~troy/pbstools/ for a qlogin
- dunno about resource limits yet
- decision: paths for /usr/local apps?
- RESOLVED: scyldev can see zhead license server
- DONE: install PGI
- decision: RPMs vs not for /usr/local
- todo: install lmod (is ready to build)
- want it to heed /opt/scyld/modulefiles as well as other location
- todo: install CUDA
- research: can we put Intel compilers on scyldev?
- decision: enable rsh on compute nodes?
- research: figure out node naming scheme. See if we can get the syntax for our customary hostnamse in beowulf config file.
- decision: is 1024 a sufficient max per-user process limit?
- decision: is 1024 a sufficient max per-user open files limit?
- research: need new IP allocation scheme
- research: do any zcluster nodes have distinct kernel command line args?
- todo': if we want users ssh to nodes for jobs, need e.g. /etc/profile.d/ssh-key.sh
- todo: grep for "nodenumber" in the PDF files to get a list of per-node config file
- todo: build Lustre client. Penguins case 62419 about this (we don't have Scyld kernel source).
Documentation
- cloud, login nodes, etc.
- queuing system