Nexus/MC2: Difference between revisions
No edit summary |
(→Jobs) |
||
(9 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
= Submission Nodes = | = Submission Nodes = | ||
You can [[SSH]] to <code>nexusmc2.umiacs.umd.edu</code> to log in to a submission host. | |||
If you store something in a local directory (/tmp, /scratch0) on one of the two submission hosts, you will need to connect to that same submission host to access it later. The actual submission hosts are: | |||
* <code>nexusmc200.umiacs.umd.edu</code> | * <code>nexusmc200.umiacs.umd.edu</code> | ||
* <code>nexusmc201.umiacs.umd.edu</code> | * <code>nexusmc201.umiacs.umd.edu</code> | ||
= Resources = | = Resources = | ||
The MC2 partition has nodes brought over from the previous standalone MC2 Slurm scheduler. The compute nodes are named <code>twist##</code>. | The MC2 partition has nodes brought over from the previous standalone MC2 Slurm scheduler as well as a few newer purchases. The compute nodes are named <code>twist##</code>. | ||
= QoS = | = QoS = | ||
MC2 users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard | MC2 users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard job QoSes]] in the <code>mc2</code> partition using the <code>mc2</code> account. | ||
The additional QoSes for the MC2 partition specifically are: | The additional jobs QoSes for the MC2 partition specifically are: | ||
* <code>highmem</code>: Allows for significantly increased memory to be allocated. | |||
* <code>huge-long</code>: Allows for longer jobs using higher overall resources. | * <code>huge-long</code>: Allows for longer jobs using higher overall resources. | ||
Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use. | Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use. | ||
= Jobs = | = Jobs = | ||
You will need to specify <code>--partition=mc2</code> | You will need to specify <code>--partition=mc2</code> and <code>--account=mc2</code> to be able to submit jobs to the MC2 partition. | ||
<pre> | <pre> | ||
[username@nexusmc200 | [username@nexusmc200 ~]$ srun --pty --ntasks=4 --mem=8G --qos=default --partition=mc2 --account=mc2 --time 1-00:00:00 bash | ||
srun: job 218874 queued and waiting for resources | srun: job 218874 queued and waiting for resources | ||
srun: job 218874 has been allocated resources | srun: job 218874 has been allocated resources | ||
[username@twist00 | [username@twist00 ~]$ scontrol show job 218874 | ||
JobId=218874 JobName=bash | JobId=218874 JobName=bash | ||
UserId=username(1000) GroupId=username(21000) MCS_label=N/A | UserId=username(1000) GroupId=username(21000) MCS_label=N/A | ||
Line 37: | Line 39: | ||
PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None | PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None | ||
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main | SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main | ||
Partition=mc2 AllocNode:Sid= | Partition=mc2 AllocNode:Sid=nexusmc200:25443 | ||
ReqNodeList=(null) ExcNodeList=(null) | ReqNodeList=(null) ExcNodeList=(null) | ||
NodeList=twist00 | NodeList=twist00 | ||
BatchHost=twist00 | BatchHost=twist00 | ||
NumNodes=1 NumCPUs= | NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:* | ||
TRES=cpu= | TRES=cpu=4,mem=8G,node=1,billing=2266 | ||
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* | Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* | ||
MinCPUsNode=1 MinMemoryNode= | MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0 | ||
Features=(null) DelayBoot=00:00:00 | Features=(null) DelayBoot=00:00:00 | ||
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) | OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) |
Latest revision as of 14:58, 24 April 2024
The Nexus scheduler houses MC2's new computational partition. Only MC2 lab members are able to run non-interruptible jobs on these nodes.
Submission Nodes
You can SSH to nexusmc2.umiacs.umd.edu
to log in to a submission host.
If you store something in a local directory (/tmp, /scratch0) on one of the two submission hosts, you will need to connect to that same submission host to access it later. The actual submission hosts are:
nexusmc200.umiacs.umd.edu
nexusmc201.umiacs.umd.edu
Resources
The MC2 partition has nodes brought over from the previous standalone MC2 Slurm scheduler as well as a few newer purchases. The compute nodes are named twist##
.
QoS
MC2 users have access to all of the standard job QoSes in the mc2
partition using the mc2
account.
The additional jobs QoSes for the MC2 partition specifically are:
highmem
: Allows for significantly increased memory to be allocated.huge-long
: Allows for longer jobs using higher overall resources.
Please note that the partition has a GrpTRES
limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
Jobs
You will need to specify --partition=mc2
and --account=mc2
to be able to submit jobs to the MC2 partition.
[username@nexusmc200 ~]$ srun --pty --ntasks=4 --mem=8G --qos=default --partition=mc2 --account=mc2 --time 1-00:00:00 bash srun: job 218874 queued and waiting for resources srun: job 218874 has been allocated resources [username@twist00 ~]$ scontrol show job 218874 JobId=218874 JobName=bash UserId=username(1000) GroupId=username(21000) MCS_label=N/A Priority=897 Nice=0 Account=mc2 QOS=default JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56 AccrueTime=2022-11-18T11:13:56 StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main Partition=mc2 AllocNode:Sid=nexusmc200:25443 ReqNodeList=(null) ExcNodeList=(null) NodeList=twist00 BatchHost=twist00 NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=4,mem=8G,node=1,billing=2266 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/nfshomes/username Power=
Storage
All data filesystems that were available in the standalone MC2 cluster are also available in Nexus.
MC2 users can also request Nexus project allocations.