Xcluster Go-Live: Difference between revisions

Revision as of 13:03, 29 May 2014

Name the cluster!!

Win a free hot-apple pie

(Scyld) Development Environment

KB:Installing scyldev (OS and Scyld)
KB:Scyldev compute nodes
Paul's VM plus physical nodes
Intel vs. AMD software builds?

Software

Intel Compiler - Guy checking on getting it.

Apps to install on xcluster:

MPI, multi-core, big-memroy, serial?
Popular apps (regardless of type; how to determine?):
- time stamps on app dirs??
- access time of executable??

New queueing system:

how/what to configure (e.g. fairshare, queues, "complexes")

Use module or not?

Scyld

Location of /usr/local/apps, libraries, and languages e.g., Perl
A bunch more stuff needed here

Cloud Manager

Use account creation process as a means to identify inactive users:

Mail PI's with list of current users

Lab Group Registration:

Lab VM login nodes

User Accounts:

New request from PI's
CAS/LDAP authentication?
affiliate accounts?

Storage

HPC IceBreaker (3 x 48TB chains)
- one chain for /home, /db, and /usr/local?
- two chains for scratch?

Archive IceBreaker (2 x 320TB chains)
- /oflow and /home, /usr/local backups

Lustre ClusterStor
- scratch for zcluster via 10GBE
- scratch for xcluster via 10GBE, maybe IB later

Other

Interactive cluster (using rack of zcluster nodes)?
New copy nodes (e.g., use some hadoop nodes)

PB notes

note: don't want perlwrapper this time
research: will scratch icebreakers run over Ethernet also, or only IB, for storage?
research: try out POD
research: what is difference between building on Intel vs AMD?
decision: which compiler to use by default? I think we think gcc.
research/decision: will we do "qlogin"? an interactive queue? what kind of enforceable resource limits?
- http://www.nics.tennessee.edu/~troy/pbstools/ for a qlogin
- dunno about resource limits yet
decision: paths for /usr/local apps?
RESOLVED: scyldev can see zhead license server
DONE: install PGI
decision: RPMs vs not for /usr/local
todo: install lmod (is ready to build)
- want it to heed /opt/scyld/modulefiles as well as other location
todo: install CUDA
research: can we put Intel compilers on scyldev?
decision: enable rsh on compute nodes?
research: figure out node naming scheme. See if we can get the syntax for our customary hostnamse in beowulf config file.
decision: is 1024 a sufficient max per-user process limit?
decision: is 1024 a sufficient max per-user open files limit?
research: need new IP allocation scheme
research: do any zcluster nodes have distinct kernel command line args?
todo': if we want users ssh to nodes for jobs, need e.g. /etc/profile.d/ssh-key.sh
todo: grep for "nodenumber" in the PDF files to get a list of per-node config file
todo: build Lustre client. Penguins case 62419 about this (we don't have Scyld kernel source).

Documentation

cloud, login nodes, etc.
queuing system

Zcluster nodes available for re-purpose

rack 6:
- (27) Dell 1950, 8-core, 16GB

rack 7:
- (16) Dell R610, 8-core, 16GB

rack 8-11:
- (123) Dell 1950, 8-core, 16GB

rack 12:
- (10) Arch, 12-core 48GB
- (2) Dell R610 Tesla 1070, 8-core 48GB
- (2) Dell R610 8-core, 48GB (old 192GB boxes)
- (3) Dell R610 8-core, 192GB
- (1) SuperMicro Tesla 2075, 4-core, 24GB (Taha?)
- (1) Dell R900, 16-core, 128GB (what is this?)

rack 13:
- (27) Dell R410, 8-core 16GB

rack 14:
- (3) Dell PE C6145, 32-core, 64GB
- (1) Dell R810, 32-core, 512GB
- (2) Dell R815 Interactive nodes, 48-core, 128GB
- (3) SuperMicro, 12-core, 256GB

rack 15:
- (26) Arch, 12-core, 48GB

rack 16:
- (10) Arch, 12-core, 48GB

rack 17:
- (9) Arch, 24-core, 128GB, (hadoop)
- (3) Arch, 24-core, 128GB, (multi-core)

rack 18:
- (5) Penguin Kepler GPU, 12-core, 96GB

@@ Line 94: / Line 94: @@
 *cloud, login nodes, etc.
 *queuing system
+===Zcluster nodes available for re-purpose===
+*rack 6:
+**(27) Dell 1950, 8-core, 16GB
+*rack 7:
+**(16) Dell R610, 8-core, 16GB
+*rack 8-11:
+**(123) Dell 1950, 8-core, 16GB
+*rack 12:
+**(10) Arch, 12-core 48GB
+**(2) Dell R610 Tesla 1070, 8-core 48GB
+**(2) Dell R610 8-core, 48GB (old 192GB boxes)
+**(3) Dell R610 8-core, 192GB
+**(1) SuperMicro Tesla 2075, 4-core, 24GB (Taha?)
+**(1) Dell R900, 16-core, 128GB (what is this?)
+*rack 13:
+**(27) Dell R410, 8-core 16GB
+*rack 14:
+**(3) Dell PE C6145, 32-core, 64GB
+**(1) Dell R810, 32-core, 512GB
+**(2) Dell R815 Interactive nodes, 48-core, 128GB
+**(3) SuperMicro, 12-core, 256GB
+*rack 15:
+**(26) Arch, 12-core, 48GB
+*rack 16:
+**(10) Arch, 12-core, 48GB
+*rack 17:
+**(9) Arch, 24-core, 128GB, (hadoop)
+**(3) Arch, 24-core, 128GB, (multi-core)
+*rack 18:
+**(5) Penguin Kepler GPU, 12-core, 96GB

Xcluster Go-Live: Difference between revisions

Revision as of 13:03, 29 May 2014

Contents

Name the cluster!!

(Scyld) Development Environment

Software

Scyld

Cloud Manager

Storage

Other

PB notes

Documentation

Zcluster nodes available for re-purpose

Navigation menu

Xcluster Go-Live: Difference between revisions

Revision as of 13:03, 29 May 2014

Name the cluster!!

(Scyld) Development Environment

Software

Scyld

Cloud Manager

Storage

Other

PB notes

Documentation

Zcluster nodes available for re-purpose

Navigation menu

Search