Difference between revisions of "Xcluster Go-Live"

From Research Computing Center Wiki
Jump to navigation Jump to search
Line 60: Line 60:
  
 
* '''note''': don't want perlwrapper this time
 
* '''note''': don't want perlwrapper this time
 
 
* '''research''': will scratch icebreakers run over Ethernet also, or only IB, for storage?
 
* '''research''': will scratch icebreakers run over Ethernet also, or only IB, for storage?
 
 
* '''research''': try out POD
 
* '''research''': try out POD
 
 
* '''research''': what is difference between building on Intel vs AMD?
 
* '''research''': what is difference between building on Intel vs AMD?
 
 
* '''decision''': which compiler to use by default?  I think we think gcc.
 
* '''decision''': which compiler to use by default?  I think we think gcc.
 
 
* '''research/decision''': will we do "qlogin"?  an interactive queue?  what kind of enforceable resource limits?
 
* '''research/decision''': will we do "qlogin"?  an interactive queue?  what kind of enforceable resource limits?
 
** http://www.nics.tennessee.edu/~troy/pbstools/ for a qlogin
 
** http://www.nics.tennessee.edu/~troy/pbstools/ for a qlogin
 
** dunno about resource limits yet
 
** dunno about resource limits yet
 
 
* '''decision''': paths for /usr/local apps?
 
* '''decision''': paths for /usr/local apps?
 
 
* '''research''': can scyldev see zhead license server?
 
* '''research''': can scyldev see zhead license server?
 
** will know after testing the now-installed PGI
 
** will know after testing the now-installed PGI
 
 
* '''decision''': RPMs vs not for /usr/local
 
* '''decision''': RPMs vs not for /usr/local
 
 
* '''todo''':  install lmod (is ready to build)
 
* '''todo''':  install lmod (is ready to build)
 
** want it to heed /opt/scyld/modulefiles as well as other location
 
** want it to heed /opt/scyld/modulefiles as well as other location
 
 
* '''todo''': install CUDA
 
* '''todo''': install CUDA
 
 
* '''research''': can we put Intel compilers on scyldev?
 
* '''research''': can we put Intel compilers on scyldev?
 
* '''decision''': enable rsh on compute nodes?
 
* '''decision''': enable rsh on compute nodes?
 
 
* '''research''': figure out node naming scheme.  See if we can get the syntax for our customary hostnamse in beowulf config file.
 
* '''research''': figure out node naming scheme.  See if we can get the syntax for our customary hostnamse in beowulf config file.
 
 
* '''decision''': is 1024 a sufficient max per-user process limit?
 
* '''decision''': is 1024 a sufficient max per-user process limit?
 
 
* '''decision''': is 1024 a sufficient max per-user open files limit?
 
* '''decision''': is 1024 a sufficient max per-user open files limit?
 +
* '''research''': need new IP allocation scheme
 +
* '''research''': do any zcluster nodes have distinct kernel command line args?
 +
* '''todo'''': if we want users ssh to nodes for jobs, need  e.g. /etc/profile.d/ssh-key.sh
 +
* '''todo''':  grep for "nodenumber" in the PDF files to get a list of per-node config file
 +
* '''todo''': build Lustre client.  Probably need Xyratex help.
  
 
===Documentation===
 
===Documentation===
 
*cloud, login nodes, etc.
 
*cloud, login nodes, etc.
 
*queuing system
 
*queuing system

Revision as of 13:35, 20 May 2014

Name the cluster!!

  • Win a free hot-apple pie


(Scyld) Development Environment

  • Paul's VM plus physical nodes
  • Intel vs. AMD software builds?


Software

Intel Compiler - do we need it?

Apps to install on xcluster:

  • MPI, multi-core, big-memroy, serial?
  • Popular apps (regardless of type; how to determine?):
    • time stamps on app dirs??
    • access time of executable??

New queueing system:

  • how/what to configure (e.g. fairshare, queues, "complexes")

Use module or not?

Scyld

  • Location of /usr/local/apps, libraries, and languages e.g., Perl
  • A bunch more stuff needed here


Cloud Manager

Lab Group Registration:

  • Lab VM login nodes

User Accounts:

  • New request from PI's
  • CAS/LDAP authentication?
  • affiliate accounts?


Storage

  • HPC IceBreaker (3 x 48TB chains)
    • one chain for /home, /db, and /usr/local?
    • two chains for scratch?
  • Archive IceBreaker (2 x 320TB chains)
    • /oflow and /home, /usr/local backups
  • Lustre ClusterStor
    • scratch for zcluster via 10GBE
    • scratch for xcluster via 10GBE, maybe IB later


PB notes

  • note: don't want perlwrapper this time
  • research: will scratch icebreakers run over Ethernet also, or only IB, for storage?
  • research: try out POD
  • research: what is difference between building on Intel vs AMD?
  • decision: which compiler to use by default? I think we think gcc.
  • research/decision: will we do "qlogin"? an interactive queue? what kind of enforceable resource limits?
  • decision: paths for /usr/local apps?
  • research: can scyldev see zhead license server?
    • will know after testing the now-installed PGI
  • decision: RPMs vs not for /usr/local
  • todo: install lmod (is ready to build)
    • want it to heed /opt/scyld/modulefiles as well as other location
  • todo: install CUDA
  • research: can we put Intel compilers on scyldev?
  • decision: enable rsh on compute nodes?
  • research: figure out node naming scheme. See if we can get the syntax for our customary hostnamse in beowulf config file.
  • decision: is 1024 a sufficient max per-user process limit?
  • decision: is 1024 a sufficient max per-user open files limit?
  • research: need new IP allocation scheme
  • research: do any zcluster nodes have distinct kernel command line args?
  • todo': if we want users ssh to nodes for jobs, need e.g. /etc/profile.d/ssh-key.sh
  • todo: grep for "nodenumber" in the PDF files to get a list of per-node config file
  • todo: build Lustre client. Probably need Xyratex help.

Documentation

  • cloud, login nodes, etc.
  • queuing system