Xcluster Go-Live: Difference between revisions
Jump to navigation
Jump to search
m (→PB notes) |
|||
Line 60: | Line 60: | ||
* '''note''': don't want perlwrapper this time | * '''note''': don't want perlwrapper this time | ||
* '''research''': will scratch icebreakers run over Ethernet also, or only IB, for storage? | * '''research''': will scratch icebreakers run over Ethernet also, or only IB, for storage? | ||
* '''research''': try out POD | * '''research''': try out POD | ||
* '''research''': what is difference between building on Intel vs AMD? | * '''research''': what is difference between building on Intel vs AMD? | ||
* '''decision''': which compiler to use by default? I think we think gcc. | * '''decision''': which compiler to use by default? I think we think gcc. | ||
* '''research/decision''': will we do "qlogin"? an interactive queue? what kind of enforceable resource limits? | * '''research/decision''': will we do "qlogin"? an interactive queue? what kind of enforceable resource limits? | ||
** http://www.nics.tennessee.edu/~troy/pbstools/ for a qlogin | ** http://www.nics.tennessee.edu/~troy/pbstools/ for a qlogin | ||
** dunno about resource limits yet | ** dunno about resource limits yet | ||
* '''decision''': paths for /usr/local apps? | * '''decision''': paths for /usr/local apps? | ||
* '''research''': can scyldev see zhead license server? | * '''research''': can scyldev see zhead license server? | ||
** will know after testing the now-installed PGI | ** will know after testing the now-installed PGI | ||
* '''decision''': RPMs vs not for /usr/local | * '''decision''': RPMs vs not for /usr/local | ||
* '''todo''': install lmod (is ready to build) | * '''todo''': install lmod (is ready to build) | ||
** want it to heed /opt/scyld/modulefiles as well as other location | ** want it to heed /opt/scyld/modulefiles as well as other location | ||
* '''todo''': install CUDA | * '''todo''': install CUDA | ||
* '''research''': can we put Intel compilers on scyldev? | * '''research''': can we put Intel compilers on scyldev? | ||
* '''decision''': enable rsh on compute nodes? | * '''decision''': enable rsh on compute nodes? | ||
* '''research''': figure out node naming scheme. See if we can get the syntax for our customary hostnamse in beowulf config file. | * '''research''': figure out node naming scheme. See if we can get the syntax for our customary hostnamse in beowulf config file. | ||
* '''decision''': is 1024 a sufficient max per-user process limit? | * '''decision''': is 1024 a sufficient max per-user process limit? | ||
* '''decision''': is 1024 a sufficient max per-user open files limit? | * '''decision''': is 1024 a sufficient max per-user open files limit? | ||
* '''research''': need new IP allocation scheme | |||
* '''research''': do any zcluster nodes have distinct kernel command line args? | |||
* '''todo'''': if we want users ssh to nodes for jobs, need e.g. /etc/profile.d/ssh-key.sh | |||
* '''todo''': grep for "nodenumber" in the PDF files to get a list of per-node config file | |||
* '''todo''': build Lustre client. Probably need Xyratex help. | |||
===Documentation=== | ===Documentation=== | ||
*cloud, login nodes, etc. | *cloud, login nodes, etc. | ||
*queuing system | *queuing system |
Revision as of 12:35, 20 May 2014
Name the cluster!!
- Win a free hot-apple pie
(Scyld) Development Environment
- Paul's VM plus physical nodes
- Intel vs. AMD software builds?
Software
Intel Compiler - do we need it?
Apps to install on xcluster:
- MPI, multi-core, big-memroy, serial?
- Popular apps (regardless of type; how to determine?):
- time stamps on app dirs??
- access time of executable??
New queueing system:
- how/what to configure (e.g. fairshare, queues, "complexes")
Use module or not?
Scyld
- Location of /usr/local/apps, libraries, and languages e.g., Perl
- A bunch more stuff needed here
Cloud Manager
Lab Group Registration:
- Lab VM login nodes
User Accounts:
- New request from PI's
- CAS/LDAP authentication?
- affiliate accounts?
Storage
- HPC IceBreaker (3 x 48TB chains)
- one chain for /home, /db, and /usr/local?
- two chains for scratch?
- Archive IceBreaker (2 x 320TB chains)
- /oflow and /home, /usr/local backups
- Lustre ClusterStor
- scratch for zcluster via 10GBE
- scratch for xcluster via 10GBE, maybe IB later
PB notes
- note: don't want perlwrapper this time
- research: will scratch icebreakers run over Ethernet also, or only IB, for storage?
- research: try out POD
- research: what is difference between building on Intel vs AMD?
- decision: which compiler to use by default? I think we think gcc.
- research/decision: will we do "qlogin"? an interactive queue? what kind of enforceable resource limits?
- http://www.nics.tennessee.edu/~troy/pbstools/ for a qlogin
- dunno about resource limits yet
- decision: paths for /usr/local apps?
- research: can scyldev see zhead license server?
- will know after testing the now-installed PGI
- decision: RPMs vs not for /usr/local
- todo: install lmod (is ready to build)
- want it to heed /opt/scyld/modulefiles as well as other location
- todo: install CUDA
- research: can we put Intel compilers on scyldev?
- decision: enable rsh on compute nodes?
- research: figure out node naming scheme. See if we can get the syntax for our customary hostnamse in beowulf config file.
- decision: is 1024 a sufficient max per-user process limit?
- decision: is 1024 a sufficient max per-user open files limit?
- research: need new IP allocation scheme
- research: do any zcluster nodes have distinct kernel command line args?
- todo': if we want users ssh to nodes for jobs, need e.g. /etc/profile.d/ssh-key.sh
- todo: grep for "nodenumber" in the PDF files to get a list of per-node config file
- todo: build Lustre client. Probably need Xyratex help.
Documentation
- cloud, login nodes, etc.
- queuing system