Xcluster Go-Live: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 94: | Line 94: | ||
*cloud, login nodes, etc. | *cloud, login nodes, etc. | ||
*queuing system | *queuing system | ||
===Zcluster nodes available for re-purpose=== | |||
*rack 6: | |||
**(27) Dell 1950, 8-core, 16GB | |||
*rack 7: | |||
**(16) Dell R610, 8-core, 16GB | |||
*rack 8-11: | |||
**(123) Dell 1950, 8-core, 16GB | |||
*rack 12: | |||
**(10) Arch, 12-core 48GB | |||
**(2) Dell R610 Tesla 1070, 8-core 48GB | |||
**(2) Dell R610 8-core, 48GB (old 192GB boxes) | |||
**(3) Dell R610 8-core, 192GB | |||
**(1) SuperMicro Tesla 2075, 4-core, 24GB (Taha?) | |||
**(1) Dell R900, 16-core, 128GB (what is this?) | |||
*rack 13: | |||
**(27) Dell R410, 8-core 16GB | |||
*rack 14: | |||
**(3) Dell PE C6145, 32-core, 64GB | |||
**(1) Dell R810, 32-core, 512GB | |||
**(2) Dell R815 Interactive nodes, 48-core, 128GB | |||
**(3) SuperMicro, 12-core, 256GB | |||
*rack 15: | |||
**(26) Arch, 12-core, 48GB | |||
*rack 16: | |||
**(10) Arch, 12-core, 48GB | |||
*rack 17: | |||
**(9) Arch, 24-core, 128GB, (hadoop) | |||
**(3) Arch, 24-core, 128GB, (multi-core) | |||
*rack 18: | |||
**(5) Penguin Kepler GPU, 12-core, 96GB |
Revision as of 13:03, 29 May 2014
Name the cluster!!
- Win a free hot-apple pie
(Scyld) Development Environment
- KB:Installing scyldev (OS and Scyld)
- KB:Scyldev compute nodes
- Paul's VM plus physical nodes
- Intel vs. AMD software builds?
Software
Intel Compiler - Guy checking on getting it.
Apps to install on xcluster:
- MPI, multi-core, big-memroy, serial?
- Popular apps (regardless of type; how to determine?):
- time stamps on app dirs??
- access time of executable??
New queueing system:
- how/what to configure (e.g. fairshare, queues, "complexes")
Use module or not?
Scyld
- Location of /usr/local/apps, libraries, and languages e.g., Perl
- A bunch more stuff needed here
Cloud Manager
Use account creation process as a means to identify inactive users:
- Mail PI's with list of current users
Lab Group Registration:
- Lab VM login nodes
User Accounts:
- New request from PI's
- CAS/LDAP authentication?
- affiliate accounts?
Storage
- HPC IceBreaker (3 x 48TB chains)
- one chain for /home, /db, and /usr/local?
- two chains for scratch?
- Archive IceBreaker (2 x 320TB chains)
- /oflow and /home, /usr/local backups
- Lustre ClusterStor
- scratch for zcluster via 10GBE
- scratch for xcluster via 10GBE, maybe IB later
Other
- Interactive cluster (using rack of zcluster nodes)?
- New copy nodes (e.g., use some hadoop nodes)
PB notes
- note: don't want perlwrapper this time
- research: will scratch icebreakers run over Ethernet also, or only IB, for storage?
- research: try out POD
- research: what is difference between building on Intel vs AMD?
- decision: which compiler to use by default? I think we think gcc.
- research/decision: will we do "qlogin"? an interactive queue? what kind of enforceable resource limits?
- http://www.nics.tennessee.edu/~troy/pbstools/ for a qlogin
- dunno about resource limits yet
- decision: paths for /usr/local apps?
- RESOLVED: scyldev can see zhead license server
- DONE: install PGI
- decision: RPMs vs not for /usr/local
- todo: install lmod (is ready to build)
- want it to heed /opt/scyld/modulefiles as well as other location
- todo: install CUDA
- research: can we put Intel compilers on scyldev?
- decision: enable rsh on compute nodes?
- research: figure out node naming scheme. See if we can get the syntax for our customary hostnamse in beowulf config file.
- decision: is 1024 a sufficient max per-user process limit?
- decision: is 1024 a sufficient max per-user open files limit?
- research: need new IP allocation scheme
- research: do any zcluster nodes have distinct kernel command line args?
- todo': if we want users ssh to nodes for jobs, need e.g. /etc/profile.d/ssh-key.sh
- todo: grep for "nodenumber" in the PDF files to get a list of per-node config file
- todo: build Lustre client. Penguins case 62419 about this (we don't have Scyld kernel source).
Documentation
- cloud, login nodes, etc.
- queuing system
Zcluster nodes available for re-purpose
- rack 6:
- (27) Dell 1950, 8-core, 16GB
- rack 7:
- (16) Dell R610, 8-core, 16GB
- rack 8-11:
- (123) Dell 1950, 8-core, 16GB
- rack 12:
- (10) Arch, 12-core 48GB
- (2) Dell R610 Tesla 1070, 8-core 48GB
- (2) Dell R610 8-core, 48GB (old 192GB boxes)
- (3) Dell R610 8-core, 192GB
- (1) SuperMicro Tesla 2075, 4-core, 24GB (Taha?)
- (1) Dell R900, 16-core, 128GB (what is this?)
- rack 13:
- (27) Dell R410, 8-core 16GB
- rack 14:
- (3) Dell PE C6145, 32-core, 64GB
- (1) Dell R810, 32-core, 512GB
- (2) Dell R815 Interactive nodes, 48-core, 128GB
- (3) SuperMicro, 12-core, 256GB
- rack 15:
- (26) Arch, 12-core, 48GB
- rack 16:
- (10) Arch, 12-core, 48GB
- rack 17:
- (9) Arch, 24-core, 128GB, (hadoop)
- (3) Arch, 24-core, 128GB, (multi-core)
- rack 18:
- (5) Penguin Kepler GPU, 12-core, 96GB