Nexus/CML: Difference between revisions
No edit summary |
No edit summary |
||
Line 30: | Line 30: | ||
|} | |} | ||
CML users (exclusively) can schedule non-interruptible jobs on the moved nodes with these job parameters. Please note that | CML users (exclusively) can schedule non-interruptible jobs on the moved nodes with these job parameters. Please note that the <code>cml-dpart</code> partition will have a <code>GrpTRES</code> limit of 100% of the available cores/RAM on each set of cml## nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use. | ||
Please note that the CML compute nodes will also be added to the institute-wide <code>scavenger</code> partition in Nexus. CML users will still have scavenging priority over these nodes via the <code>cml-scavenger</code> partition (i.e., all <code>cml-</code> queue jobs (other than <code>cml-scavenger</code>) can preempt both <code>cml-scavenger</code> and <code>scavenger</code> queue jobs, and <code>cml-scavenger</code> queue jobs can preempt <code>scavenger</code> queue jobs). | Please note that the CML compute nodes will also be added to the institute-wide <code>scavenger</code> partition in Nexus. CML users will still have scavenging priority over these nodes via the <code>cml-scavenger</code> partition (i.e., all <code>cml-</code> queue jobs (other than <code>cml-scavenger</code>) can preempt both <code>cml-scavenger</code> and <code>scavenger</code> queue jobs, and <code>cml-scavenger</code> queue jobs can preempt <code>scavenger</code> queue jobs). |
Revision as of 18:28, 3 July 2023
The CML standalone cluster's compute nodes will fold into Nexus on Thursday, August 17th, 2023 during the scheduled maintenance window for August.
The Nexus cluster already has a large pool of compute resources made possible through leftover funding for the Brendan Iribe Center. Details on common nodes already in the cluster (Tron partition) can be found here.
In addition, the CML cluster's standalone submission node cmlsub00.umiacs.umd.edu
will be retired on Thursday, September 21st, 2023 during that month's maintenance window, as it will no longer be able to submit jobs to CML compute nodes after the August maintenance window. Please use nexuscml00.umiacs.umd.edu
and nexuscml01.umiacs.umd.edu
for any general purpose CML compute needs after this time.
Please see the Timeline section below for concrete dates in chronological order.
Please contact staff with any questions or concerns.
Usage
The Nexus cluster submission nodes that are allocated to CML are nexuscml00.umiacs.umd.edu
and nexuscml01.umiacs.umd.edu
. You must use these nodes to submit jobs to CML compute nodes after the August maintenance window. Submission from cmlsub00.umiacs.umd.edu
will no longer work.
All partitions, QoSes, and account names from the standalone CML cluster are being moved over to Nexus when the compute nodes move. However, please note that cml-
will be prepended to all of the values that were present in the standalone CML cluster to distinguish them from existing values in Nexus.
Here are some before/after examples of job submission with various parameters:
Standalone CML cluster submission command | Nexus cluster submission command |
---|---|
srun --partition=dpart --qos=medium --account=tomg --gres=gpu:rtxa4000:2 --pty bash
|
srun --partition=cml-dpart --qos=cml-medium --account=cml-tomg --gres=gpu:rtxa4000:2 --pty bash
|
srun --partition=cpu --qos=cpu --pty bash
|
srun --partition=cml-cpu --qos=cml-cpu --pty bash
|
srun --partition=scavenger --qos=scavenger --account=scavenger --gres=gpu:4 --pty bash
|
srun --partition=cml-scavenger --qos=cml-scavenger --account=cml-scavenger --gres=gpu:4 --pty bash
|
CML users (exclusively) can schedule non-interruptible jobs on the moved nodes with these job parameters. Please note that the cml-dpart
partition will have a GrpTRES
limit of 100% of the available cores/RAM on each set of cml## nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
Please note that the CML compute nodes will also be added to the institute-wide scavenger
partition in Nexus. CML users will still have scavenging priority over these nodes via the cml-scavenger
partition (i.e., all cml-
queue jobs (other than cml-scavenger
) can preempt both cml-scavenger
and scavenger
queue jobs, and cml-scavenger
queue jobs can preempt scavenger
queue jobs).
Timeline
All events are liable to begin as early as 9am US Eastern time on the dates indicated, unless otherwise indicated. Each event will be completed within the business week (i.e. Fridays at 5pm) or within the timeframe specified.
Date | Event |
---|---|
August 17th 2023, 5-8pm | All standalone CML cluster compute nodes are moved into Nexus in corresponding cml- named partitions
|
September 21st 2023, 5-8pm | cmlsub00.umiacs.umd.edu is taken offline
|