Xcluster Go-Live: Difference between revisions

Revision as of 14:11, 29 May 2014

Name the cluster!!

Win a free hot-apple pie

(Scyld) Development Environment

KB:Installing scyldev (OS and Scyld)
KB:Scyldev compute nodes
Paul's VM plus physical nodes
Intel vs. AMD software builds?

Software

Intel Compiler - Guy checking on getting it.

Apps to install on xcluster:

MPI, multi-core, big-memroy, serial?
Popular apps (regardless of type; how to determine?):
- time stamps on app dirs??
- access time of executable??

New queueing system:

how/what to configure (e.g. fairshare, queues, "complexes")

Use module or not?

Scyld

Location of /usr/local/apps, libraries, and languages e.g., Perl
A bunch more stuff needed here

Cloud Manager

Use account creation process as a means to identify inactive users:

Mail PI's with list of current users

Lab Group Registration:

Lab VM login nodes

User Accounts:

New request from PI's
CAS/LDAP authentication?
affiliate accounts?

Storage

HPC IceBreaker (3 x 48TB chains)
- one chain for /home, /db, and /usr/local?
- two chains for scratch?

Archive IceBreaker (2 x 320TB chains)
- /oflow and /home, /usr/local backups

Lustre ClusterStor
- scratch for zcluster via 10GBE
- scratch for xcluster via 10GBE, maybe IB later

Other

Interactive cluster (using rack of zcluster nodes)?
New copy nodes (e.g., use some hadoop nodes)

PB notes

note: don't want perlwrapper this time
research: will scratch icebreakers run over Ethernet also, or only IB, for storage?
research: try out POD
research: what is difference between building on Intel vs AMD?
decision: which compiler to use by default? I think we think gcc.
research/decision: will we do "qlogin"? an interactive queue? what kind of enforceable resource limits?
- http://www.nics.tennessee.edu/~troy/pbstools/ for a qlogin
- dunno about resource limits yet
decision: paths for /usr/local apps?
RESOLVED: scyldev can see zhead license server
DONE: install PGI
decision: RPMs vs not for /usr/local
todo: install lmod (is ready to build)
- want it to heed /opt/scyld/modulefiles as well as other location
todo: install CUDA
research: can we put Intel compilers on scyldev?
decision: enable rsh on compute nodes?
research: figure out node naming scheme. See if we can get the syntax for our customary hostnamse in beowulf config file.
decision: is 1024 a sufficient max per-user process limit?
decision: is 1024 a sufficient max per-user open files limit?
research: need new IP allocation scheme
research: do any zcluster nodes have distinct kernel command line args?
todo': if we want users ssh to nodes for jobs, need e.g. /etc/profile.d/ssh-key.sh
todo: grep for "nodenumber" in the PDF files to get a list of per-node config file
todo: build Lustre client. Penguins case 62419 about this (we don't have Scyld kernel source).

Documentation

cloud, login nodes, etc.
queuing system

Zcluster nodes available for re-purpose

rack 6:
- (27) Dell 1950, 8-core, 16GB

rack 7:
- (16) Dell R610, 8-core, 16GB
- (1) Dell R810 Big Memory, 32-core, 512GB
- (2) SuperMicro, 12-core, 256GB

rack 8-11:
- (123) Dell 1950, 8-core, 16GB
- (5) SuperMicro, 12-core, 256GB

rack 12:
- (10) Arch, 12-core 48GB
- (2) Dell R610 Tesla 1070, 8-core 48GB
- (2) Dell R610 8-core, 48GB (old 192GB boxes)
- (3) Dell R610 8-core, 192GB
- (1) SuperMicro Tesla 2075, 4-core, 24GB (Taha?)
- (1) Dell R900, 16-core, 128GB (what is this?)

rack 13:
- (27) Dell R410, 8-core 16GB

rack 14:
- (3) Dell PE C6145, 32-core, 64GB
- (1) Dell R810 Big Memory, 32-core, 512GB
- (2) Dell R815 Interactive nodes, 48-core, 128GB
- (3) SuperMicro, 12-core, 256GB

rack 15:
- (26) Arch, 12-core, 48GB

rack 16:
- (10) Arch, 12-core, 48GB

rack 17:
- (9) Arch, 24-core, 128GB, (hadoop)
- (3) Arch, 24-core, 128GB, (multi-core)

rack 18:
- (5) Penguin Kepler GPU, 12-core, 96GB

@@ Line 102: / Line 102: @@
 *rack 7:
 **(16) Dell R610, 8-core, 16GB
+**(1) '''Dell R810 Big Memory, 32-core, 512GB'''
+**(2) SuperMicro, 12-core, 256GB
 *rack 8-11:
 **(123) Dell 1950, 8-core, 16GB
+**(5) SuperMicro, 12-core, 256GB
 *rack 12:
@@ Line 119: / Line 122: @@
 *rack 14:
 **(3) Dell PE C6145, 32-core, 64GB
-**(1) Dell R810, 32-core, 512GB
+**(1) '''Dell R810 Big Memory, 32-core, 512GB'''
-**(2) Dell R815 Interactive nodes, 48-core, 128GB
+**(2) '''Dell R815 Interactive nodes, 48-core, 128GB'''
 **(3) SuperMicro, 12-core, 256GB
@@ Line 130: / Line 133: @@
 *rack 17:
-**(9) Arch, 24-core, 128GB, (hadoop)
+**(9) '''Arch, 24-core, 128GB, (hadoop)'''
-**(3) Arch, 24-core, 128GB, (multi-core)
+**(3) '''Arch, 24-core, 128GB, (multi-core)'''
 *rack 18:
-**(5) Penguin Kepler GPU, 12-core, 96GB
+**(5) '''Penguin Kepler GPU, 12-core, 96GB'''

Xcluster Go-Live: Difference between revisions

Revision as of 14:11, 29 May 2014

Contents

Name the cluster!!

(Scyld) Development Environment

Software

Scyld

Cloud Manager

Storage

Other

PB notes

Documentation

Zcluster nodes available for re-purpose

Navigation menu

Xcluster Go-Live: Difference between revisions

Revision as of 14:11, 29 May 2014

Name the cluster!!

(Scyld) Development Environment

Software

Scyld

Cloud Manager

Storage

Other

PB notes

Documentation

Zcluster nodes available for re-purpose

Navigation menu

Search