Rocky 8 Transition Guide
Introduction
As part of our August 29-31,2023 maintenance window, GACRC will be upgrading the Sapelo2 cluster Linux operating system from CentOS 7 to Rocky 8.
Why is a major Operating System (OS) update necessary?
- Existing OS is End of Life - There are no more full version updates being released for the existing operating system and newer versions of some software applications are not supported by the current OS version.
- Bringing New Nodes Online - As development within the existing OS has stopped, some of the latest generation of compute node hardware cannot use it, needing driver types newer than what this OS has. New hardware and architecture that we will be bringing online soon requires this OS update.
- Security Improvements - In order to keep our cluster as up to date as possible, these kinds of big OS updates need to happen.
- Why Rocky 8? - A good portion of the HPC centers is adopting it, which means there is a good amount of community support.
What does this mean to you and your workflows?
Overview
- We are not changing anything from the data storage standpoint. All existing /home, /scratch, /work, /project spaces will retain the existing data.
- The compiler toolchains and many software packages will be updated to newer versions.
- Because this is a major OS update, we need to recompile all the applications and ensure that they work with the new version of OS.
- We will have as comprehensive a software suite available on the new OS as possible, but some less widely used applications and older version software will not be immediately available.
- As software modules will be reinstalled and updated, all pending jobs will be canceled during the maintenance window, to prevent job failure due to changes in the module names post maintenance.
Storage
There will be no changes to the storage system at this maintenance window. All existing /home, /scratch, /work, /project, /db spaces will be available after the maintenance and they will retain the existing data.
Queueing System
The Slurm queueing system will be updated from version 21.08.8 to version 23.02.2. Most compute nodes available on the CentOS 7 system will continue to be available after the transition to Rocky 8 and the Slurm partitions will remain the same.
Software
Warning
Because this is a major change in the operating system, most user software built on CentOS 7 will not work and will need to be rebuilt. Even if the programs run without being rebuilt the change in the underlying libraries may impact code execution and results. Therefore, users should test and verify that their codes are producing the expected results on the new operating system.
Compiler toolchains
The base compiler toolchains used to build software libraries and applications on the cluster will be updated, as newer versions are able to generate more optimized code for newer computer hardware and newer software versions.
Base compiler toolchains on CentOS 7:
- GCCcore/8.3.0, GCC/8.3.0, gompi/2019b, foss/2019b
- GCCcore/10.2.0, GCC/10.2.0, gompi/2020b, foss/2020b
- CUDA versions 10.2 and 11.1
- OpenMPI versions 3.1.4 and 4.0.5
Base compiler toolchains on Rocky 8:
- GCCcore/11.2.0, GCC/11.2.0, gompi/2021b, foss/2021b
- GCCcore/11.3.0, GCC/11.3.0, gompi/2022a, foss/2022a
- CUDA versions 11.4, 11.7, and 12.0
- OpenMPI versions 4.1.2 and 4.1.4
Centrally installed modules
Centrally installed software modules will continue to have the format Name/Version-Toolchain, but for most software packages the Version and Toolchain will updated. Some module names have an optional Versionsuffix and it might change or be dropped on the new system. There are modules whose names will remain the same on the Rocky 8 system. Some examples:
Software | Module name on CentOS 7 | Module name on Rocky 8 | Changes |
---|---|---|---|
ABySS | ABySS/2.3.1-foss-2019b | ABySS/2.3.5-foss-2021b | version, toolchain |
BLAST+ | BLAST+/2.12.0-gompi-2020b | BLAST+/2.13.0-gompi-2022a | version, toolchain |
BWA | BWA/0.7.17-GCC-10.3.0 | BWA/0.7.17-GCCcore-11.2.0 | toolchain |
DeepAffinity | DeepAffinity/0.1 | not available (yet) | |
SAMtools | SAMtools/1.16.1-GCC-11.3.0 | SAMtools/1.16.1-GCC-11.3.0 | no changes |
STAR | STAR/2.7.10a-GCC-8.3.0 | STAR/2.7.10b-GCC-11.3.0 | version, toolchain |
Trinity | Trinity/2.10.0-foss-2019b-Python-3.7.4 | Trinity/2.15.1-foss-2022a | version, toolchain, versionsuffix |
Conda environments
Some users have conda environments installed in their home directory or group shared directories. These environments should be reinstalled on the Rocky 8 system, using versions of Miniconda or Anaconda available there. Documentation on how to install conda environments on the cluster is available at
Python packages
Python libraries and virtual environments need to be reinstalled as well, using versions of Python, Miniconda, or Anaconda available there.
R packages
We recommend that user reinstall any R packages that they have installed in their own directories, to make sure they are compatible with the new OS version and with the versions of R available there.
Singularity containers
Singularity containers that you used on CentOS 7 should continue to work on the Rocky 8 system. The containers installed centrally in /apps/singularity-images will be available after the maintenance.
Potential issues
Error connecting to Sapelo2
Because Sapelo2 was reinstalled, you might encounter a "host key" or "host id" error when you connect to Sapelo2 for the first time after the maintenance.
Connecting from MacOS or Linux
Users connecting from a MacOS or a Linux system might see an error like this:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: POSSIBLE DNS SPOOFING DETECTED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ The ECDSA host key for sapelo2 has changed, and the key for the corresponding IP address 128.192.75.18 is unchanged. This could either mean that DNS SPOOFING is happening or the IP address for the host and its host key have changed at the same time. Offending key for IP in /Users/jsmith/.ssh/known_hosts:76 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed. The fingerprint for the ECDSA key sent by the remote host is SHA256:E1ovq19vLNYNF1eFiOQ91tc1EPtbHcMhML2I45UrJrE. Please contact your system administrator. Add correct host key in /Users/jsmith/.ssh/known_hosts to get rid of this message. Offending ECDSA key in /Users/jsmith/.ssh/known_hosts:25 ECDSA host key for sapelo2 has changed and you have requested strict checking. Host key verification failed.
To fix this problem, open the known_hosts file on your local machine (in the example above the full path to this file is /Users/jsmith/.ssh/known_hosts, as shown in the error message above). Then delete the line that has sapelo2.gacrc.uga.edu and save the file.
Once you have done this, you should be able to ssh into sapelo2.gacrc.uga.edu. You might still get a message like this:
[jsmith@laptop]$ ssh jsmith@sapelo2.gacrc.uga.edu The authenticity of host 'sapelo2.gacrc.uga.edu' can't be established. ECDSA key fingerprint is SHA256:ikdjggjeorjgnkresitnsgjsms ECDSA key fingerprint is MD5:be:1xxxxxxxxxxxx Are you sure you want to continue connecting (yes/no)?
You can type yes and your connection should work.
Connecting from Windows
When connecting from Windows for the first time after the maintenance, users might encounter an error like POTENTIAL SECURITY BREACH or HOST IDENTIFICATION HAS CHANGED. Users can click Yes to continue the connection and have a new host key saved on their local machines.