Nexus/CLIP: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
No edit summary
(5 intermediate revisions by the same user not shown)
Line 10: Line 10:
You may need to re-compile or re-link your applications due to the changes to the underlying operating system libraries. We have tried to maintain a similar set of software in our GNU [[Modules]] software trees for both operating systems. However, you may need to let us know if there is something missing after the upgrades.
You may need to re-compile or re-link your applications due to the changes to the underlying operating system libraries. We have tried to maintain a similar set of software in our GNU [[Modules]] software trees for both operating systems. However, you may need to let us know if there is something missing after the upgrades.


In addition, the general purpose nodes <code>context00.umiacs.umd.edu</code> and <code>context01.umiacs.umd.edu</code> will be retired on Monday, September 5th, 2022 at 9am. Please use <code>clipsub00.umiacs.umd.edu</code> and <code>clipsub01.umiacs.umd.edu</code> (or the <code>nexusclip</code> submission nodes) for any general purpose CLIP compute needs after this time.
In addition, the general purpose nodes <code>context00.umiacs.umd.edu</code> and <code>context01.umiacs.umd.edu</code> were retired on Tuesday, September 6th, 2022 at 9am. Please use <code>clipsub00.umiacs.umd.edu</code> and <code>clipsub01.umiacs.umd.edu</code> (or the <code>nexusclip</code> submission nodes) for any general purpose CLIP compute needs.


Lastly, /cliphomes directories will be deprecated sometime in the coming year. The Nexus cluster uses [[NFShomes | /nfshomes]] directories for home directory storage space. There will be a future announcement about this deprecation that includes a concrete date after the cluster node moves are done or nearly done. /cliphomes will be made read-only once the cluster node moves are done.
Lastly, /cliphomes directories will be deprecated sometime in the coming year. The Nexus cluster uses [[NFShomes | /nfshomes]] directories for home directory storage space. There will be a future announcement about this deprecation that includes a concrete date after the cluster node moves are done or nearly done. /cliphomes will be made read-only once the cluster node moves are done.
Line 19: Line 19:


==Usage==
==Usage==
As compute nodes are folded into the Nexus cluster, CLIP users (exclusively) will be able to schedule non-interruptible jobs on the moved nodes by including the <code>--partition=clip</code> and <code>--account=clip</code> submission arguments.
The Nexus cluster submission nodes that are allocated to clip are <code>nexusclip00.umiacs.umd.edu</code> and <code>nexusclip01.umiacs.umd.edu</code>. You will need to log onto one of these submission nodes to use the moved compute nodes. Submission from <code>clipsub00.umiacs.umd.edu</code> or <code>clipsub01.umiacs.umd.edu</code> will not work.
 
CLIP users (exclusively) can schedule non-interruptible jobs on the moved nodes by including the <code>--partition=clip</code> and <code>--account=clip</code> submission arguments.


The Quality of Service (QoS) options present on the CLIP SLURM scheduler will not be migrated into the Nexus SLURM scheduler by default. The <code>huge-long</code> QoS can be used to request resources beyond those available in the universal Nexus QoSes listed [[Nexus#Quality_of_Service_.28QoS.29 | here]]. If you are interested in migrating a QoS from the CLIP scheduler to the Nexus scheduler, please [[HelpDesk | contact staff]] and we will evaluate the request.
The Quality of Service (QoS) options present on the CLIP SLURM scheduler will not be migrated into the Nexus SLURM scheduler by default. The <code>huge-long</code> QoS can be used to request resources beyond those available in the universal Nexus QoSes listed [[Nexus#Quality_of_Service_.28QoS.29 | here]]. If you are interested in migrating a QoS from the CLIP scheduler to the Nexus scheduler, please [[HelpDesk | contact staff]] and we will evaluate the request.
Line 45: Line 47:
| <code>materialgpu01</code> and <code>materialgpu02</code> are moved into Nexus as <code>clip08</code> and <code>clip09</code>
| <code>materialgpu01</code> and <code>materialgpu02</code> are moved into Nexus as <code>clip08</code> and <code>clip09</code>
|-
|-
| September 5th 2022
| September 6th 2022
| <code>context00</code> and <code>context01</code> are taken offline
| <code>context00</code> and <code>context01</code> are taken offline
|-
|-
| September 2022
| September 2022
| Announcement is made about remaining compute nodes moving into Nexus
| Announcement is made about remaining (non-GPU) compute nodes moving into Nexus
|-
|-
| Fall 2022
| Fall 2022
| Announcement is made about the deprecation of <code>/fs/cliphomes</code> directories
| Announcement is made about the deprecation of <code>/fs/cliphomes</code> directories
|}
|}

Revision as of 14:31, 9 September 2022

Overview

The CLIP lab's cluster compute nodes will be gradually folded into UMIACS' new Nexus cluster beginning on Monday, July 25th, 2022 at 9am in order to further the goal of consolidating all compute nodes in UMIACS onto one common SLURM scheduler.

The Nexus cluster already has a large pool of compute resources made possible through leftover funding for the Brendan Iribe Center. Details on common nodes already in the cluster (Tron partition) can be found here.

As part of the transition, compute nodes will be reinstalled with Red Hat Enterprise Linux 8 (RHEL8) as their operating system. The nodes are currently installed with Red Hat Enterprise Linux 7 (RHEL7) as is. Their names will also change to be just clip## for consistency with Nexus' naming scheme.

Data stored on the local scratch drives of compute nodes (/scratch0, /scratch1, etc.) will not persist through the reinstalls. Please secure all data in these local scratch drives to a network attached storage location prior to each nodes' move date as listed below.

You may need to re-compile or re-link your applications due to the changes to the underlying operating system libraries. We have tried to maintain a similar set of software in our GNU Modules software trees for both operating systems. However, you may need to let us know if there is something missing after the upgrades.

In addition, the general purpose nodes context00.umiacs.umd.edu and context01.umiacs.umd.edu were retired on Tuesday, September 6th, 2022 at 9am. Please use clipsub00.umiacs.umd.edu and clipsub01.umiacs.umd.edu (or the nexusclip submission nodes) for any general purpose CLIP compute needs.

Lastly, /cliphomes directories will be deprecated sometime in the coming year. The Nexus cluster uses /nfshomes directories for home directory storage space. There will be a future announcement about this deprecation that includes a concrete date after the cluster node moves are done or nearly done. /cliphomes will be made read-only once the cluster node moves are done.

Please see the Timeline section below for concrete dates in chronological order.

Please contact staff with any questions or concerns.

Usage

The Nexus cluster submission nodes that are allocated to clip are nexusclip00.umiacs.umd.edu and nexusclip01.umiacs.umd.edu. You will need to log onto one of these submission nodes to use the moved compute nodes. Submission from clipsub00.umiacs.umd.edu or clipsub01.umiacs.umd.edu will not work.

CLIP users (exclusively) can schedule non-interruptible jobs on the moved nodes by including the --partition=clip and --account=clip submission arguments.

The Quality of Service (QoS) options present on the CLIP SLURM scheduler will not be migrated into the Nexus SLURM scheduler by default. The huge-long QoS can be used to request resources beyond those available in the universal Nexus QoSes listed here. If you are interested in migrating a QoS from the CLIP scheduler to the Nexus scheduler, please contact staff and we will evaluate the request.

Timeline

All events are liable to begin as early as 9am US Eastern time on the dates indicated. Each event will be completed within the business week (i.e. Fridays at 5pm).

Date Event
July 25th 2022 clipgpu00 and clipgpu01 are moved into Nexus as clip00 and clip01
August 1st 2022 clipgpu02 and clipgpu03 are moved into Nexus as clip02 and clip03
August 8th 2022 clipgpu04 and clipgpu05 are moved into Nexus as clip04 and clip05
August 15th 2022 clipgpu06 and materialgpu00 are moved into Nexus as clip06 and clip07
August 22nd 2022 materialgpu01 and materialgpu02 are moved into Nexus as clip08 and clip09
September 6th 2022 context00 and context01 are taken offline
September 2022 Announcement is made about remaining (non-GPU) compute nodes moving into Nexus
Fall 2022 Announcement is made about the deprecation of /fs/cliphomes directories