Nexus/Vulcan

From UMIACS
Revision as of 12:54, 21 July 2023 by Mbaney (talk | contribs)
Jump to navigation Jump to search

The Vulcan standalone cluster's compute nodes will fold into Nexus on Thursday, August 17th, 2023 during the scheduled maintenance window for August (5-8pm).

The Nexus cluster already has a large pool of compute resources made possible through leftover funding for the Brendan Iribe Center. Details on common nodes already in the cluster (Tron partition) can be found here.

In addition, the Vulcan cluster's standalone submission nodes vulcansub00.umiacs.umd.edu and vulcansub01.umiacs.umd.edu will be retired on Thursday, September 21st, 2023 during that month's maintenance window (5-8pm), as they will no longer be able to submit jobs to Vulcan compute nodes after the August maintenance window. Please use nexusvulcan00.umiacs.umd.edu and nexusvulcan01.umiacs.umd.edu for any general purpose Vulcan compute needs after this time.

Please see the Timeline section below for concrete dates in chronological order.

Please contact staff with any questions or concerns.

Usage

The Nexus cluster submission nodes that are allocated to Vulcan are nexusvulcan00.umiacs.umd.edu and nexusvulcan01.umiacs.umd.edu. You must use these nodes to submit jobs to Vulcan compute nodes after the August maintenance window. Submission from vulcansub00.umiacs.umd.edu or vulcansub01.umiacs.umd.edu will no longer work.

All partitions, QoSes, and account names from the standalone Vulcan cluster are being moved over to Nexus when the compute nodes move. However, please note that vulcan- will be prepended to all of the values that were present in the standalone Vulcan cluster to distinguish them from existing values in Nexus. The lone exception is the base account currently named vulcan in the standalone cluster (will retain same name).

Here are some before/after examples of job submission with various parameters:

Standalone Vulcan cluster submission command Nexus cluster submission command
srun --partition=dpart --qos=medium --account=abhinav --gres=gpu:rtxa4000:2 --pty bash srun --partition=vulcan-dpart --qos=vulcan-medium --account=vulcan-abhinav --gres=gpu:rtxa4000:2 --pty bash
srun --partition=cpu --qos=cpu --pty bash srun --partition=vulcan-cpu --vulcan=cml-cpu --pty bash
srun --partition=scavenger --qos=scavenger --account=vulcan --gres=gpu:4 --pty bash srun --partition=vulcan-scavenger --qos=vulcan-scavenger --account=vulcan --gres=gpu:4 --pty bash

Vulcan users (exclusively) can schedule non-interruptible jobs on the moved nodes with these job parameters. Please note that the vulcan-dpart partition will have a GrpTRES limit of 100% of the available cores/RAM on vulcan## nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use.

Please note that the Vulcan compute nodes will also be added to the institute-wide scavenger partition in Nexus. Vulcan users will still have scavenging priority over these nodes via the vulcan-scavenger partition (i.e., all vulcan- queue jobs (other than vulcan-scavenger) can preempt both vulcan-scavenger and scavenger queue jobs, and vulcan-scavenger queue jobs can preempt scavenger queue jobs).

Timeline

Each event will be completed within the timeframe specified.

Date Event
August 17th 2023, 5-8pm All standalone Vulcan cluster compute nodes are moved into Nexus in corresponding vulcan- named partitions
September 21st 2023, 5-8pm vulcansub00.umiacs.umd.edu and vulcansub01.umiacs.umd.edu are taken offline

Post-Migration

The below information will become relevant AFTER 8pm on Thursday, August 17th, 2023.

Partitions

stub

Accounts

stub

QoS

stub

Data Storage

All data storage that was available on the standalone Vulcan cluster will continue to be available in Nexus.

However, please note that the Nexus cluster uses NFShomes home directories - if your UMIACS account was created on or before February 21st, 2023, you have been using /fs/cfarhomes/<username> as your home directory on the standalone Vulcan cluster. While /fs/cfarhomes is available on Nexus, your shell scripts from it will not automatically load. Please copy over anything you need to your /fs/nfshomes/<username> directory at your earliest convenience.