Latest revision as of 17:21, 2 December 2024

The previous standalone cluster for MC2's compute nodes have folded into Nexus as of mid 2022.

The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found here.

Please contact staff with any questions or concerns.

Submission Nodes

You can SSH to nexusmc2.umiacs.umd.edu to log in to a submission node.

If you store something in a local filesystem directory (/tmp, /scratch0) on one of the two submission nodes, you will need to connect to that same submission node to access it later. The actual submission nodes are:

nexusmc200.umiacs.umd.edu
nexusmc201.umiacs.umd.edu

Compute Nodes

The MC2 partition has nodes brought over from the previous standalone MC2 Slurm scheduler, which have since been combined with older nodes purchased by other labs/centers. The compute nodes are named legacy##.

QoS

MC2 users have access to all of the standard job QoSes in the MC2 partition using the mc2 account.

The additional jobs QoSes for the MC2 partition specifically are:

highmem: Allows for significantly increased memory to be allocated.
huge-long: Allows for longer jobs using higher overall resources.

Please note that the partition has a GrpTRES limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.

Jobs

You will need to specify --partition=mc2 and --account=mc2 to be able to submit jobs to the MC2 partition.

[username@nexusmc200 ~]$ srun --pty --ntasks=4 --mem=8G --qos=default --partition=mc2 --account=mc2 --time 1-00:00:00 bash
srun: job 218874 queued and waiting for resources
srun: job 218874 has been allocated resources
[username@legacy00 ~]$ scontrol show job 218874
JobId=218874 JobName=bash
   UserId=username(1000) GroupId=username(21000) MCS_label=N/A
   Priority=897 Nice=0 Account=mc2 QOS=default
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56
   AccrueTime=2022-11-18T11:13:56
   StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A
   PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
   Partition=mc2 AllocNode:Sid=nexusmc200:25443
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=legacy00
   BatchHost=legacy00
   NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=4,mem=8G,node=1,billing=2266
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/nfshomes/username
   Power=

Storage

All data filesystems that were available in the standalone MC2 cluster are also available in Nexus.

MC2 users can also request Nexus project allocations.

@@ Line 1: / Line 1: @@
-The [[Nexus]] scheduler houses [https://cyber.umd.edu/about MC2]'s new computational partition.
+The previous standalone cluster for [https://cyber.umd.edu/about MC2]'s compute nodes have folded into [[Nexus]] as of mid 2022.
+The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found [[Nexus/Tron | here]].
+Please [[HelpDesk | contact staff]] with any questions or concerns.
 = Submission Nodes =
-There are two submission nodes for Nexus exclusively available for MC2 users.
+You can [[SSH]] to <code>nexusmc2.umiacs.umd.edu</code> to log in to a submission node.
+If you store something in a local filesystem directory (/tmp, /scratch0) on one of the two submission nodes, you will need to connect to that same submission node to access it later. The actual submission nodes are:
 * <code>nexusmc200.umiacs.umd.edu</code>
 * <code>nexusmc201.umiacs.umd.edu</code>
-= Resources =
+= Compute Nodes =
-The MC2 partition has nodes brought over from the previous standalone MC2 Slurm scheduler. The compute nodes are named <code>twist##</code>.
+The MC2 partition has nodes brought over from the previous standalone MC2 Slurm scheduler, which have since been combined with older nodes purchased by other labs/centers. The compute nodes are named <code>legacy##</code>.
 = QoS =
-MC2 users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard QoS']] in the <code>mc2</code> partition using the <code>mc2</code> account.
+MC2 users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard job QoSes]] in the MC2 partition using the <code>mc2</code> account.
+The additional jobs QoSes for the MC2 partition specifically are:
+* <code>highmem</code>: Allows for significantly increased memory to be allocated.
+* <code>huge-long</code>: Allows for longer jobs using higher overall resources.
-<pre>
+Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
-$ show_qos
-        Name     MaxWall MaxJobs                        MaxTRES                      MaxTRESPU              GrpTRES
------------- ----------- ------- ------------------------------ ------------------------------ --------------------
-      normal
-   scavenger  2-00:00:00             cpu=64,gres/gpu=8,mem=256G   cpu=192,gres/gpu=24,mem=768G
-      medium  2-00:00:00               cpu=8,gres/gpu=2,mem=64G
-        high  1-00:00:00             cpu=16,gres/gpu=4,mem=128G
-     default  3-00:00:00               cpu=4,gres/gpu=1,mem=32G
-        tron                                                        cpu=32,gres/gpu=4,mem=256G
-   huge-long 10-00:00:00             cpu=32,gres/gpu=8,mem=256G
-        clip                                                                                      cpu=339,mem=2926G
-       class                                                        cpu=32,gres/gpu=4,mem=256G
-       gamma                                                                                      cpu=179,mem=1511G
-         mc2                                                                                      cpu=307,mem=1896G
-        cbcb                                                                                     cpu=913,mem=46931G
-     highmem 21-00:00:00                       cpu=32,mem=2000G
-</pre>
 = Jobs =
-You will need to specify <code>--partition=mc2</code>, <code>--account=mc2</code>, and a specific <code>--qos</code> to be able to submit jobs to the MC2 partition.
+You will need to specify <code>--partition=mc2</code> and <code>--account=mc2</code> to be able to submit jobs to the MC2 partition.
 <pre>
-[username@nexusmc200:~ ] $ srun --pty --ntasks=4 --mem=8G --qos=default --partition=mc2 --account=mc2 --time 1-00:00:00 bash
+[username@nexusmc200 ~]$ srun --pty --ntasks=4 --mem=8G --qos=default --partition=mc2 --account=mc2 --time 1-00:00:00 bash
 srun: job 218874 queued and waiting for resources
 srun: job 218874 has been allocated resources
-[username@twist00:~ ] $ scontrol show job 218874
+[username@legacy00 ~]$ scontrol show job 218874
 JobId=218874 JobName=bash
     UserId=username(1000) GroupId=username(21000) MCS_label=N/A
@@ Line 51: / Line 43: @@
     PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
     SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
-    Partition=mc2 AllocNode:Sid=nexuscbcb00:25443
+    Partition=mc2 AllocNode:Sid=nexusmc200:25443
     ReqNodeList=(null) ExcNodeList=(null)
-    NodeList=twist00
+    NodeList=legacy00
-    BatchHost=twist00
+    BatchHost=legacy00
-    NumNodes=1 NumCPUs=16 NumTasks=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
+    NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
-    TRES=cpu=16,mem=2000G,node=1,billing=2266
+    TRES=cpu=4,mem=8G,node=1,billing=2266
     Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
-    MinCPUsNode=1 MinMemoryNode=2000G MinTmpDiskNode=0
+    MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0
     Features=(null) DelayBoot=00:00:00
     OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)

Nexus/MC2: Difference between revisions

Latest revision as of 17:21, 2 December 2024

Contents

Submission Nodes

Compute Nodes

QoS

Jobs

Storage

Navigation menu

Nexus/MC2: Difference between revisions

Latest revision as of 17:21, 2 December 2024

Submission Nodes

Compute Nodes

QoS

Jobs

Storage

Navigation menu

Search