Nexus/MC2: Difference between revisions
(Created page with "The Nexus scheduler houses [https://cyber.umd.edu/about MC2]'s new computational partition. = Submission Nodes = There are two submission nodes for Nexus exclusively avai...") |
No edit summary |
||
(20 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
The | The previous standalone cluster for [https://cyber.umd.edu/about MC2]'s compute nodes have folded into [[Nexus]] as of mid 2022. | ||
The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found [[Nexus/Tron | here]]. | |||
Please [[HelpDesk | contact staff]] with any questions or concerns. | |||
= Submission Nodes = | = Submission Nodes = | ||
You can [[SSH]] to <code>nexusmc2.umiacs.umd.edu</code> to log in to a submission node. | |||
If you store something in a local filesystem directory (/tmp, /scratch0) on one of the two submission nodes, you will need to connect to that same submission node to access it later. The actual submission nodes are: | |||
* <code>nexusmc200.umiacs.umd.edu</code> | * <code>nexusmc200.umiacs.umd.edu</code> | ||
* <code>nexusmc201.umiacs.umd.edu</code> | * <code>nexusmc201.umiacs.umd.edu</code> | ||
Line 11: | Line 16: | ||
= QoS = | = QoS = | ||
MC2 users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard | MC2 users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard job QoSes]] in the MC2 partition using the <code>mc2</code> account. | ||
The additional jobs QoSes for the MC2 partition specifically are: | |||
* <code>highmem</code>: Allows for significantly increased memory to be allocated. | |||
* <code>huge-long</code>: Allows for longer jobs using higher overall resources. | |||
< | Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use. | ||
= Jobs = | = Jobs = | ||
You will need to specify <code>--partition=mc2</code> | You will need to specify <code>--partition=mc2</code> and <code>--account=mc2</code> to be able to submit jobs to the MC2 partition. | ||
<pre> | <pre> | ||
[username@nexusmc200 | [username@nexusmc200 ~]$ srun --pty --ntasks=4 --mem=8G --qos=default --partition=mc2 --account=mc2 --time 1-00:00:00 bash | ||
srun: job 218874 queued and waiting for resources | srun: job 218874 queued and waiting for resources | ||
srun: job 218874 has been allocated resources | srun: job 218874 has been allocated resources | ||
[username@twist00 | [username@twist00 ~]$ scontrol show job 218874 | ||
JobId=218874 JobName=bash | JobId=218874 JobName=bash | ||
UserId=username(1000) GroupId=username(21000) MCS_label=N/A | UserId=username(1000) GroupId=username(21000) MCS_label=N/A | ||
Line 51: | Line 43: | ||
PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None | PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None | ||
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main | SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main | ||
Partition=mc2 AllocNode:Sid= | Partition=mc2 AllocNode:Sid=nexusmc200:25443 | ||
ReqNodeList=(null) ExcNodeList=(null) | ReqNodeList=(null) ExcNodeList=(null) | ||
NodeList=twist00 | NodeList=twist00 | ||
BatchHost=twist00 | BatchHost=twist00 | ||
NumNodes=1 NumCPUs= | NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:* | ||
TRES=cpu= | TRES=cpu=4,mem=8G,node=1,billing=2266 | ||
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* | Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* | ||
MinCPUsNode=1 MinMemoryNode= | MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0 | ||
Features=(null) DelayBoot=00:00:00 | Features=(null) DelayBoot=00:00:00 | ||
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) | OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) |
Latest revision as of 17:44, 22 November 2024
The previous standalone cluster for MC2's compute nodes have folded into Nexus as of mid 2022.
The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found here.
Please contact staff with any questions or concerns.
Submission Nodes
You can SSH to nexusmc2.umiacs.umd.edu
to log in to a submission node.
If you store something in a local filesystem directory (/tmp, /scratch0) on one of the two submission nodes, you will need to connect to that same submission node to access it later. The actual submission nodes are:
nexusmc200.umiacs.umd.edu
nexusmc201.umiacs.umd.edu
Resources
The MC2 partition has nodes brought over from the previous standalone MC2 Slurm scheduler. The compute nodes are named twist##
.
QoS
MC2 users have access to all of the standard job QoSes in the MC2 partition using the mc2
account.
The additional jobs QoSes for the MC2 partition specifically are:
highmem
: Allows for significantly increased memory to be allocated.huge-long
: Allows for longer jobs using higher overall resources.
Please note that the partition has a GrpTRES
limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
Jobs
You will need to specify --partition=mc2
and --account=mc2
to be able to submit jobs to the MC2 partition.
[username@nexusmc200 ~]$ srun --pty --ntasks=4 --mem=8G --qos=default --partition=mc2 --account=mc2 --time 1-00:00:00 bash srun: job 218874 queued and waiting for resources srun: job 218874 has been allocated resources [username@twist00 ~]$ scontrol show job 218874 JobId=218874 JobName=bash UserId=username(1000) GroupId=username(21000) MCS_label=N/A Priority=897 Nice=0 Account=mc2 QOS=default JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56 AccrueTime=2022-11-18T11:13:56 StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main Partition=mc2 AllocNode:Sid=nexusmc200:25443 ReqNodeList=(null) ExcNodeList=(null) NodeList=twist00 BatchHost=twist00 NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=4,mem=8G,node=1,billing=2266 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/nfshomes/username Power=
Storage
All data filesystems that were available in the standalone MC2 cluster are also available in Nexus.
MC2 users can also request Nexus project allocations.