Nexus/Vulcan: Difference between revisions
No edit summary |
No edit summary |
||
Line 53: | Line 53: | ||
===Partitions=== | ===Partitions=== | ||
There are three partitions available to general Vulcan [[SLURM]] users. You must specify a partition when submitting your job. | |||
* '''vulcan-dpart''' - This is the default partition. Job allocations are guaranteed. | |||
* '''vulcan-scavenger''' - This is the alternate partition that allows jobs longer run times and more resources but is preemptable when jobs in other <code>cml-</code> partitions are ready to be scheduled. | |||
* '''vulcan-cpu''' - This partition is for CPU focused jobs. Job allocations are guaranteed. | |||
===Accounts=== | ===Accounts=== | ||
Vulcan has a base SLURM account <code>vulcan</code> which has a modest number of guaranteed billing resources available to all cluster users at any given time. Other faculty that have invested in the cluster have an additional account provided to their sponsored accounts on the cluster, which provides a number of guaranteed billing resources corresponding to the amount that they invested. If you do not specify an account when submitting your job, you will receive the '''vulcan''' account. | |||
<pre> | |||
$ sacctmgr show account format=Account%20,Description%30,Organization%10 | |||
Account Descr Org | |||
-------------------- ------------------------------ ---------- | |||
... ... ... | |||
vulcan-abhinav vulcan - abhinav shrivastava vulcan | |||
vulcan-djacobs vulcan - david jacobs vulcan | |||
vulcan-janus vulcan - janus vulcan | |||
vulcan-jbhuang vulcan - jia-bin huang vulcan | |||
vulcan-lsd vulcan - larry davis vulcan | |||
vulcan-metzler vulcan - chris metzler vulcan | |||
vulcan-rama vulcan - rama chellappa vulcan | |||
vulcan-ramani vulcan - ramani duraiswami vulcan | |||
vulcan-yaser vulcan - yaser yacoob vulcan | |||
vulcan-zwicker vulcan - matthias zwicker vulcan | |||
... ... ... | |||
</pre> | |||
You can check your account associations by running the '''show_assoc''' to see the accounts you are associated with. Please [[HelpDesk | contact staff]] and include your faculty member in the conversation if you do not see the appropriate association. | |||
<pre> | |||
$ show_assoc | |||
User Account Def Acct Def QOS QOS | |||
---------- -------------- ---------- --------- ---------------------------------------- | |||
... ... ... ... ... | |||
abhinav vulcan-abhinav vulcan-default,vulcan-high,vulcan-medium | |||
abhinav vulcan vulcan-cpu,vulcan-default,vulcan-medium | |||
... ... ... ... ... | |||
===QoS=== | ===QoS=== | ||
You will need to decide the QOS to submit with which will set a certain number of restrictions to your job. | |||
The following <code>sacctmgr</code> command will list the current QOS. Either the <code>vulcan-default</code>, <code>vulcan-medium</code>, or <code>vulcan-high</code> QOS is required for the vulcan-dpart partition. This will be passed to all your submission commands as <code>--qos</code>. | |||
The following example will show you the current limits that the QOS have. | |||
<pre> | |||
$ show_qos | |||
Name MaxWall MaxTRES MaxJobsPU MaxSubmitPU MaxTRESPU GrpTRES | |||
-------------------- ----------- ------------------------------ --------- ----------- ------------------------------ -------------------- | |||
... ... ... ... ... ... ... | |||
vulcan-medium 3-00:00:00 cpu=8,gres/gpu=2,mem=64G 2 | |||
vulcan-high 1-12:00:00 cpu=16,gres/gpu=4,mem=128G 2 | |||
vulcan-default 7-00:00:00 cpu=4,gres/gpu=1,mem=32G 2 | |||
vulcan-scavenger 3-00:00:00 cpu=32,gres/gpu=8,mem=256G | |||
vulcan-janus 3-00:00:00 cpu=32,gres/gpu=10,mem=256G | |||
vulcan-exempt 7-00:00:00 cpu=32,gres/gpu=8,mem=256G 2 | |||
vulcan-cpu 2-00:00:00 cpu=1024,mem=4T 4 | |||
vulcan-exclusive 30-00:00:00 | |||
vulcan-sailon 3-00:00:00 cpu=32,gres/gpu=8,mem=256G gres/gpu=48 | |||
... ... ... ... ... ... ... | |||
</pre> | |||
===Data Storage=== | ===Data Storage=== |
Revision as of 13:15, 21 July 2023
The Vulcan standalone cluster's compute nodes will fold into Nexus on Thursday, August 17th, 2023 during the scheduled maintenance window for August (5-8pm).
The Nexus cluster already has a large pool of compute resources made possible through leftover funding for the Brendan Iribe Center. Details on common nodes already in the cluster (Tron partition) can be found here.
In addition, the Vulcan cluster's standalone submission nodes vulcansub00.umiacs.umd.edu
and vulcansub01.umiacs.umd.edu
will be retired on Thursday, September 21st, 2023 during that month's maintenance window (5-8pm), as they will no longer be able to submit jobs to Vulcan compute nodes after the August maintenance window. Please use nexusvulcan00.umiacs.umd.edu
and nexusvulcan01.umiacs.umd.edu
for any general purpose Vulcan compute needs after this time.
Please see the Timeline section below for concrete dates in chronological order.
Please contact staff with any questions or concerns.
Usage
The Nexus cluster submission nodes that are allocated to Vulcan are nexusvulcan00.umiacs.umd.edu
and nexusvulcan01.umiacs.umd.edu
. You must use these nodes to submit jobs to Vulcan compute nodes after the August maintenance window. Submission from vulcansub00.umiacs.umd.edu
or vulcansub01.umiacs.umd.edu
will no longer work.
All partitions, QoSes, and account names from the standalone Vulcan cluster are being moved over to Nexus when the compute nodes move. However, please note that vulcan-
will be prepended to all of the values that were present in the standalone Vulcan cluster to distinguish them from existing values in Nexus. The lone exception is the base account currently named vulcan
in the standalone cluster (will retain same name).
Here are some before/after examples of job submission with various parameters:
Standalone Vulcan cluster submission command | Nexus cluster submission command |
---|---|
srun --partition=dpart --qos=medium --account=abhinav --gres=gpu:rtxa4000:2 --pty bash
|
srun --partition=vulcan-dpart --qos=vulcan-medium --account=vulcan-abhinav --gres=gpu:rtxa4000:2 --pty bash
|
srun --partition=cpu --qos=cpu --pty bash
|
srun --partition=vulcan-cpu --vulcan=cml-cpu --pty bash
|
srun --partition=scavenger --qos=scavenger --account=vulcan --gres=gpu:4 --pty bash
|
srun --partition=vulcan-scavenger --qos=vulcan-scavenger --account=vulcan --gres=gpu:4 --pty bash
|
Vulcan users (exclusively) can schedule non-interruptible jobs on the moved nodes with these job parameters. Please note that the vulcan-dpart
partition will have a GrpTRES
limit of 100% of the available cores/RAM on vulcan## nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
Please note that the Vulcan compute nodes will also be added to the institute-wide scavenger
partition in Nexus. Vulcan users will still have scavenging priority over these nodes via the vulcan-scavenger
partition (i.e., all vulcan-
queue jobs (other than vulcan-scavenger
) can preempt both vulcan-scavenger
and scavenger
queue jobs, and vulcan-scavenger
queue jobs can preempt scavenger
queue jobs).
Timeline
Each event will be completed within the timeframe specified.
Date | Event |
---|---|
August 17th 2023, 5-8pm | All standalone Vulcan cluster compute nodes are moved into Nexus in corresponding vulcan- named partitions
|
September 21st 2023, 5-8pm | vulcansub00.umiacs.umd.edu and vulcansub01.umiacs.umd.edu are taken offline
|
Post-Migration
The below information will become relevant AFTER 8pm on Thursday, August 17th, 2023.
Partitions
There are three partitions available to general Vulcan SLURM users. You must specify a partition when submitting your job.
- vulcan-dpart - This is the default partition. Job allocations are guaranteed.
- vulcan-scavenger - This is the alternate partition that allows jobs longer run times and more resources but is preemptable when jobs in other
cml-
partitions are ready to be scheduled. - vulcan-cpu - This partition is for CPU focused jobs. Job allocations are guaranteed.
Accounts
Vulcan has a base SLURM account vulcan
which has a modest number of guaranteed billing resources available to all cluster users at any given time. Other faculty that have invested in the cluster have an additional account provided to their sponsored accounts on the cluster, which provides a number of guaranteed billing resources corresponding to the amount that they invested. If you do not specify an account when submitting your job, you will receive the vulcan account.
$ sacctmgr show account format=Account%20,Description%30,Organization%10 Account Descr Org -------------------- ------------------------------ ---------- ... ... ... vulcan-abhinav vulcan - abhinav shrivastava vulcan vulcan-djacobs vulcan - david jacobs vulcan vulcan-janus vulcan - janus vulcan vulcan-jbhuang vulcan - jia-bin huang vulcan vulcan-lsd vulcan - larry davis vulcan vulcan-metzler vulcan - chris metzler vulcan vulcan-rama vulcan - rama chellappa vulcan vulcan-ramani vulcan - ramani duraiswami vulcan vulcan-yaser vulcan - yaser yacoob vulcan vulcan-zwicker vulcan - matthias zwicker vulcan ... ... ...
You can check your account associations by running the show_assoc to see the accounts you are associated with. Please contact staff and include your faculty member in the conversation if you do not see the appropriate association.
$ show_assoc User Account Def Acct Def QOS QOS ---------- -------------- ---------- --------- ---------------------------------------- ... ... ... ... ... abhinav vulcan-abhinav vulcan-default,vulcan-high,vulcan-medium abhinav vulcan vulcan-cpu,vulcan-default,vulcan-medium ... ... ... ... ... ===QoS=== You will need to decide the QOS to submit with which will set a certain number of restrictions to your job. The following <code>sacctmgr</code> command will list the current QOS. Either the <code>vulcan-default</code>, <code>vulcan-medium</code>, or <code>vulcan-high</code> QOS is required for the vulcan-dpart partition. This will be passed to all your submission commands as <code>--qos</code>. The following example will show you the current limits that the QOS have. <pre> $ show_qos Name MaxWall MaxTRES MaxJobsPU MaxSubmitPU MaxTRESPU GrpTRES -------------------- ----------- ------------------------------ --------- ----------- ------------------------------ -------------------- ... ... ... ... ... ... ... vulcan-medium 3-00:00:00 cpu=8,gres/gpu=2,mem=64G 2 vulcan-high 1-12:00:00 cpu=16,gres/gpu=4,mem=128G 2 vulcan-default 7-00:00:00 cpu=4,gres/gpu=1,mem=32G 2 vulcan-scavenger 3-00:00:00 cpu=32,gres/gpu=8,mem=256G vulcan-janus 3-00:00:00 cpu=32,gres/gpu=10,mem=256G vulcan-exempt 7-00:00:00 cpu=32,gres/gpu=8,mem=256G 2 vulcan-cpu 2-00:00:00 cpu=1024,mem=4T 4 vulcan-exclusive 30-00:00:00 vulcan-sailon 3-00:00:00 cpu=32,gres/gpu=8,mem=256G gres/gpu=48 ... ... ... ... ... ... ...
Data Storage
All data storage that was available on the standalone Vulcan cluster will continue to be available in Nexus.
However, please note that the Nexus cluster uses NFShomes home directories - if your UMIACS account was created on or before February 21st, 2023, you have been using /fs/cfarhomes/<username>
as your home directory on the standalone Vulcan cluster. While /fs/cfarhomes
is available on Nexus, your shell scripts from it will not automatically load. Please copy over anything you need to your /fs/nfshomes/<username>
directory at your earliest convenience.