Sapelo2 extended maintenance - Slurm migration

From Research Computing Center Wiki
Revision as of 08:01, 28 September 2020 by Shtsai (talk | contribs) (Created page with "Summary of this message: *Sapelo2 will be shut down for maintenance on Oct. 24, 2020 at 6 a.m. and will remain unavailable until 5 p.m. on Oct. 28, 2020 (tentative end-of-mai...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Summary of this message:

  • Sapelo2 will be shut down for maintenance on Oct. 24, 2020 at 6 a.m. and will remain unavailable until 5 p.m. on Oct. 28, 2020 (tentative end-of-maintenance date).
  • All jobs running on Sapelo2 at 6 a.m. on Oct. 24 will be deleted.
  • Sapelo2 will migrate to Slurm and the new software environment, so current job submission scripts will no longer work after this maintenance.
  • Slurm migration training videos and workshops are available (see below).
  • Sap2test and the xfer nodes will also be unavailable during this maintenance.
  • The teaching cluster will not be affected.


GACRC has scheduled an extended Sapelo2 maintenance starting on October 24, 2020 at 6:00 a.m. to switch its queueing system from Torque/Moab to Slurm and to update its operating system from CentOS 7.5 to CentOS 7.8. Compiler toolchains and the software application modules will also be updated to the versions that are now available on Sap2test. A number of other major maintenance tasks, such as the update of the operating system on the Lustre file system, will be performed as well. Sapelo2, Sap2test, and the xfer nodes will be unavailable for approximately 5 days and we expect these to return to service by 5 p.m. on Oct. 28, 2020.

All Sapelo2 jobs still running when the maintenance begins at 6:00 a.m. on Oct. 24 will be terminated. Because of this, we have implemented a "reservation" on the queueing system that will only start jobs whose requested walltime would permit them to complete running before Oct. 24.

Once this maintenance is complete, Sapelo2 will run Slurm and a new software environment. Therefore, job submission scripts and workflows based on Torque/Moab and the software packages currently installed on Sapelo2 will no longer work. Only the software packages currently installed on Sap2test will be available on Sapelo2 after the maintenance.

To help users familiarize with Slurm and the test cluster environment, we have prepared some training videos that are available from the GACRC's Kaltura channel at https://kaltura.uga.edu/channel/GACRC/176125031 (login with MyID and password is required). Additionally, if you are interested in any of our Sap2test migration training workshops, please refer to https://wiki.gacrc.uga.edu/wiki/Training to see the scheduled events. Documentation on the Slurm test cluster (Sap2test) is available at https://wiki.gacrc.uga.edu/wiki/Systems#Slurm_Test_Cluster_.28Sap2test.29

We strongly encourage you to convert your job submission scripts to Slurm and test your workflows on Sap2test as soon as possible. Some software packages that you need might not be available on Sap2test. If so, please let us know at your earliest convenience so we can get them installed.

If you need any support related to the Slurm migration, please let us know via the online form https://uga.teamdynamix.com/TDClient/2060/Portal/Requests/ServiceDet?ID=41600