Latest revision as of 14:58, 24 April 2024

The Nexus scheduler houses MC2's new computational partition. Only MC2 lab members are able to run non-interruptible jobs on these nodes.

Submission Nodes

You can SSH to nexusmc2.umiacs.umd.edu to log in to a submission host.

If you store something in a local directory (/tmp, /scratch0) on one of the two submission hosts, you will need to connect to that same submission host to access it later. The actual submission hosts are:

nexusmc200.umiacs.umd.edu
nexusmc201.umiacs.umd.edu

Resources

The MC2 partition has nodes brought over from the previous standalone MC2 Slurm scheduler as well as a few newer purchases. The compute nodes are named twist##.

QoS

MC2 users have access to all of the standard job QoSes in the mc2 partition using the mc2 account.

The additional jobs QoSes for the MC2 partition specifically are:

highmem: Allows for significantly increased memory to be allocated.
huge-long: Allows for longer jobs using higher overall resources.

Please note that the partition has a GrpTRES limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.

Jobs

You will need to specify --partition=mc2 and --account=mc2 to be able to submit jobs to the MC2 partition.

[username@nexusmc200 ~]$ srun --pty --ntasks=4 --mem=8G --qos=default --partition=mc2 --account=mc2 --time 1-00:00:00 bash
srun: job 218874 queued and waiting for resources
srun: job 218874 has been allocated resources
[username@twist00 ~]$ scontrol show job 218874
JobId=218874 JobName=bash
   UserId=username(1000) GroupId=username(21000) MCS_label=N/A
   Priority=897 Nice=0 Account=mc2 QOS=default
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56
   AccrueTime=2022-11-18T11:13:56
   StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A
   PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
   Partition=mc2 AllocNode:Sid=nexusmc200:25443
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=twist00
   BatchHost=twist00
   NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=4,mem=8G,node=1,billing=2266
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/nfshomes/username
   Power=

Storage

All data filesystems that were available in the standalone MC2 cluster are also available in Nexus.

MC2 users can also request Nexus project allocations.

@@ Line 2: / Line 2: @@
 = Submission Nodes =
-There are two submission nodes for Nexus exclusively available for MC2 users.
+You can [[SSH]] to <code>nexusmc2.umiacs.umd.edu</code> to log in to a submission host.
+If you store something in a local directory (/tmp, /scratch0) on one of the two submission hosts, you will need to connect to that same submission host to access it later. The actual submission hosts are:
 * <code>nexusmc200.umiacs.umd.edu</code>
 * <code>nexusmc201.umiacs.umd.edu</code>
 = Resources =
-The MC2 partition has nodes brought over from the previous standalone MC2 Slurm scheduler. The compute nodes are named <code>twist##</code>.
+The MC2 partition has nodes brought over from the previous standalone MC2 Slurm scheduler as well as a few newer purchases. The compute nodes are named <code>twist##</code>.
 = QoS =
-MC2 users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard QoS']] in the <code>mc2</code> partition using the <code>mc2</code> account.
+MC2 users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard job QoSes]] in the <code>mc2</code> partition using the <code>mc2</code> account.
-The additional QoSes for the MC2 partition specifically are:
+The additional jobs QoSes for the MC2 partition specifically are:
+* <code>highmem</code>: Allows for significantly increased memory to be allocated.
 * <code>huge-long</code>: Allows for longer jobs using higher overall resources.
-Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
+Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
 = Jobs =
-You will need to specify <code>--partition=mc2</code>, <code>--account=mc2</code>, and a specific <code>--qos</code> to be able to submit jobs to the MC2 partition.
+You will need to specify <code>--partition=mc2</code> and <code>--account=mc2</code> to be able to submit jobs to the MC2 partition.
 <pre>
-[username@nexusmc200:~ ] $ srun --pty --ntasks=4 --mem=8G --qos=default --partition=mc2 --account=mc2 --time 1-00:00:00 bash
+[username@nexusmc200 ~]$ srun --pty --ntasks=4 --mem=8G --qos=default --partition=mc2 --account=mc2 --time 1-00:00:00 bash
 srun: job 218874 queued and waiting for resources
 srun: job 218874 has been allocated resources
-[username@twist00:~ ] $ scontrol show job 218874
+[username@twist00 ~]$ scontrol show job 218874
 JobId=218874 JobName=bash
     UserId=username(1000) GroupId=username(21000) MCS_label=N/A
@@ Line 37: / Line 39: @@
     PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
     SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
-    Partition=mc2 AllocNode:Sid=nexuscbcb00:25443
+    Partition=mc2 AllocNode:Sid=nexusmc200:25443
     ReqNodeList=(null) ExcNodeList=(null)
     NodeList=twist00
     BatchHost=twist00
-    NumNodes=1 NumCPUs=16 NumTasks=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
+    NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
-    TRES=cpu=16,mem=2000G,node=1,billing=2266
+    TRES=cpu=4,mem=8G,node=1,billing=2266
     Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
-    MinCPUsNode=1 MinMemoryNode=2000G MinTmpDiskNode=0
+    MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0
     Features=(null) DelayBoot=00:00:00
     OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)

Nexus/MC2: Difference between revisions

Latest revision as of 14:58, 24 April 2024

Contents

Submission Nodes

Resources

QoS

Jobs

Storage

Navigation menu

Nexus/MC2: Difference between revisions

Latest revision as of 14:58, 24 April 2024

Submission Nodes

Resources

QoS

Jobs

Storage

Navigation menu

Search