Xcluster Go-Live

From Research Computing Center Wiki
Jump to navigation Jump to search

Name the cluster!!

  • Win a free hot-apple pie


(Scyld) Development Environment

Software

Intel Compiler - Guy checking on getting it.

Apps to install on xcluster:

  • MPI, multi-core, big-memroy, serial?
  • Popular apps (regardless of type; how to determine?):
    • time stamps on app dirs??
    • access time of executable??

New queueing system:

  • how/what to configure (e.g. fairshare, queues, "complexes")

Use module or not?

Scyld

  • Location of /usr/local/apps, libraries, and languages e.g., Perl
  • A bunch more stuff needed here


Cloud Manager

Use account creation process as a means to identify inactive users:

  • Mail PI's with list of current users

Lab Group Registration:

  • Lab VM login nodes

User Accounts:

  • New request from PI's
  • CAS/LDAP authentication?
  • affiliate accounts?

Storage

  • HPC IceBreaker (3 x 48TB chains)
    • one chain for /home, /db, and /usr/local?
    • two chains for scratch?
  • Archive IceBreaker (2 x 320TB chains)
    • /oflow and /home, /usr/local backups
  • Lustre ClusterStor
    • scratch for zcluster via 10GBE
    • scratch for xcluster via 10GBE, maybe IB later


PB notes

  • note: don't want perlwrapper this time
  • research: will scratch icebreakers run over Ethernet also, or only IB, for storage?
  • research: try out POD
  • research: what is difference between building on Intel vs AMD?
  • decision: which compiler to use by default? I think we think gcc.
  • research/decision: will we do "qlogin"? an interactive queue? what kind of enforceable resource limits?
  • decision: paths for /usr/local apps?
  • RESOLVED: scyldev can see zhead license server
  • DONE: install PGI
  • decision: RPMs vs not for /usr/local
  • todo: install lmod (is ready to build)
    • want it to heed /opt/scyld/modulefiles as well as other location
  • todo: install CUDA
  • research: can we put Intel compilers on scyldev?
  • decision: enable rsh on compute nodes?
  • research: figure out node naming scheme. See if we can get the syntax for our customary hostnamse in beowulf config file.
  • decision: is 1024 a sufficient max per-user process limit?
  • decision: is 1024 a sufficient max per-user open files limit?
  • research: need new IP allocation scheme
  • research: do any zcluster nodes have distinct kernel command line args?
  • todo': if we want users ssh to nodes for jobs, need e.g. /etc/profile.d/ssh-key.sh
  • todo: grep for "nodenumber" in the PDF files to get a list of per-node config file
  • todo: build Lustre client. Penguins case 62419 about this (we don't have Scyld kernel source).

Documentation

  • cloud, login nodes, etc.
  • queuing system