Difference between revisions of "Sapelo2 changes after Oct.24 maintenance"

From Research Computing Center Wiki
Jump to navigation Jump to search
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
  
  
 +
Below are some user visible changes implemented on Sapelo2 during this maintenance window.
  
 
===Replaced queueing system===
 
===Replaced queueing system===
  
 
Sapelo2 runs the Slurm queueing system now.
 
Sapelo2 runs the Slurm queueing system now.
 +
 +
To help users familiarize with Slurm and the test cluster environment, we have prepared some training videos that are available from the GACRC's Kaltura channel at https://kaltura.uga.edu/channel/GACRC/176125031 (login with MyID and password is required). Please also refer to [[Running Jobs on Sapelo2]] and [[Migrating from Torque to Slurm]].
  
 
===Changes in the maximum walltime limit of the partitions (queues)===
 
===Changes in the maximum walltime limit of the partitions (queues)===
  
The batch, highmem_p, and gpu_p partitions now have a 7-day maximum walltime limit. New batch_30d, highmem_30d_p, and gpu_30d_p allow up to 30 days of walltime limit, but these queues have a more limited availability. Please see [[Running Jobs on Sapelo2]].
+
The batch, highmem_p, and gpu_p partitions now have a 7-day maximum walltime limit. New batch_30d, highmem_30d_p, and gpu_30d_p partitions allow up to 30 days of walltime limit, but these queues have a more limited availability. Please see [[Running Jobs on Sapelo2]].
  
 
===Sap2test not available anymore===
 
===Sap2test not available anymore===
  
 
When you connect to Sapelo2 using the hostname sapelo2.gacrc.uga.edu, you will be connected to the same environment as Sap2test, before the maintenance window. The sap2test.gacrc.uga.edu hostname is not active anymore.
 
When you connect to Sapelo2 using the hostname sapelo2.gacrc.uga.edu, you will be connected to the same environment as Sap2test, before the maintenance window. The sap2test.gacrc.uga.edu hostname is not active anymore.
 +
 +
===New Toolchains and Software Environment installed===
 +
 +
Sapelo2's operating system was updated from CentOS 7.5 to CentOS 7.8. Compiler toolchains and the software application modules were updated as well. For more information, please see [[Software on Sapelo2]].
  
 
===Error connecting to Sapelo2===
 
===Error connecting to Sapelo2===
Line 66: Line 73:
  
 
When connecting from Windows for the first time after the maintenance, users might encounter an error like '''POTENTIAL SECURITY BREACH''' or '''HOST IDENTIFICATION HAS CHANGED'''. Users can click '''Yes'''to continue the connection and have a new host key saved on their local machines.
 
When connecting from Windows for the first time after the maintenance, users might encounter an error like '''POTENTIAL SECURITY BREACH''' or '''HOST IDENTIFICATION HAS CHANGED'''. Users can click '''Yes'''to continue the connection and have a new host key saved on their local machines.
 +
 +
===Compute nodes have new hostnames===
 +
 +
As we expand the Sapelo2 cluster by adding more compute nodes, it is helpful to identify compute nodes by its node rack location and node slot in the rack. Therefore we now incorporate this information in the hostname of the compute nodes. The new hostnames have the format '''rackname'''-'''nodeslot''', for example, node ra6-8 refers to a node on node rack ra6 and that occupies slot number 8 in this rack.

Latest revision as of 21:59, 29 October 2020


Below are some user visible changes implemented on Sapelo2 during this maintenance window.

Replaced queueing system

Sapelo2 runs the Slurm queueing system now.

To help users familiarize with Slurm and the test cluster environment, we have prepared some training videos that are available from the GACRC's Kaltura channel at https://kaltura.uga.edu/channel/GACRC/176125031 (login with MyID and password is required). Please also refer to Running Jobs on Sapelo2 and Migrating from Torque to Slurm.

Changes in the maximum walltime limit of the partitions (queues)

The batch, highmem_p, and gpu_p partitions now have a 7-day maximum walltime limit. New batch_30d, highmem_30d_p, and gpu_30d_p partitions allow up to 30 days of walltime limit, but these queues have a more limited availability. Please see Running Jobs on Sapelo2.

Sap2test not available anymore

When you connect to Sapelo2 using the hostname sapelo2.gacrc.uga.edu, you will be connected to the same environment as Sap2test, before the maintenance window. The sap2test.gacrc.uga.edu hostname is not active anymore.

New Toolchains and Software Environment installed

Sapelo2's operating system was updated from CentOS 7.5 to CentOS 7.8. Compiler toolchains and the software application modules were updated as well. For more information, please see Software on Sapelo2.

Error connecting to Sapelo2

Because Sapelo2 was reinstalled, you might encounter a "host key" or "host id" error when you connect to Sapelo2 for the first time after the maintenance.


Connecting from MacOS or Linux

Users connecting from a MacOS or a Linux system might see an error like this:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
The ECDSA host key for sapelo2 has changed,
and the key for the corresponding IP address 128.192.75.18
is unchanged. This could either mean that
DNS SPOOFING is happening or the IP address for the host
and its host key have changed at the same time.
Offending key for IP in /Users/jsmith/.ssh/known_hosts:76
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:E1ovq19vLNYNF1eFiOQ91tc1EPtbHcMhML2I45UrJrE.
Please contact your system administrator.
Add correct host key in /Users/jsmith/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /Users/jsmith/.ssh/known_hosts:25
ECDSA host key for sapelo2 has changed and you have requested strict checking.
Host key verification failed.

To fix this problem, open the known_hosts file on your local machine (in the example above the full path to this file is /Users/jsmith/.ssh/known_hosts, as shown in the error message above). Then delete the line that has sapelo2.gacrc.uga.edu and save the file.

Once you have done this, you should be able to ssh into sapelo2.gacrc.uga.edu. You might still get a message like this:

[jsmith@laptop]$ ssh jsmith@sapelo2.gacrc.uga.edu
The authenticity of host 'sapelo2.gacrc.uga.edu' can't be established.
ECDSA key fingerprint is SHA256:ikdjggjeorjgnkresitnsgjsms
ECDSA key fingerprint is MD5:be:1xxxxxxxxxxxx
Are you sure you want to continue connecting (yes/no)? 

You can type yes and your connection should work.


Connecting from Windows

When connecting from Windows for the first time after the maintenance, users might encounter an error like POTENTIAL SECURITY BREACH or HOST IDENTIFICATION HAS CHANGED. Users can click Yesto continue the connection and have a new host key saved on their local machines.

Compute nodes have new hostnames

As we expand the Sapelo2 cluster by adding more compute nodes, it is helpful to identify compute nodes by its node rack location and node slot in the rack. Therefore we now incorporate this information in the hostname of the compute nodes. The new hostnames have the format rackname-nodeslot, for example, node ra6-8 refers to a node on node rack ra6 and that occupies slot number 8 in this rack.