Nexus/CLIP: Difference between revisions
(→QoS) |
(→Jobs) |
||
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
The [[Nexus]] scheduler houses [https://wiki.umiacs.umd.edu/clip/index.php/Main_Page CLIP]'s new computational partition. | The [[Nexus]] scheduler houses [https://wiki.umiacs.umd.edu/clip/index.php/Main_Page CLIP]'s new computational partition. Only CLIP lab members are able to run non-interruptible jobs on these nodes. | ||
= Submission Nodes = | = Submission Nodes = | ||
You can [[SSH]] to <code>nexusclip.umiacs.umd.edu</code> to log in to a submission host. | |||
If you store something in a local directory (/tmp, /scratch0) on one of the two submission hosts, you will need to connect to that same submission host to access it later. The actual submission hosts are: | |||
* <code>nexusclip00.umiacs.umd.edu</code> | * <code>nexusclip00.umiacs.umd.edu</code> | ||
* <code>nexusclip01.umiacs.umd.edu</code> | * <code>nexusclip01.umiacs.umd.edu</code> | ||
Line 11: | Line 12: | ||
= QoS = | = QoS = | ||
CLIP users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard | CLIP users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard job QoSes]] in the <code>clip</code> partition using the <code>clip</code> account. | ||
The additional QoSes for the CLIP partition specifically are: | The additional job QoSes for the CLIP partition specifically are: | ||
* <code>huge-long</code>: Allows for longer jobs using higher overall resources. | * <code>huge-long</code>: Allows for longer jobs using higher overall resources. | ||
Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use. | Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use. | ||
= Jobs = | = Jobs = | ||
You will need to specify <code>--partition=clip</code> | You will need to specify <code>--partition=clip</code> and <code>--account=clip</code> to be able to submit jobs to the CLIP partition. | ||
<pre> | <pre> | ||
Line 37: | Line 38: | ||
PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None | PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None | ||
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main | SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main | ||
Partition=clip AllocNode:Sid= | Partition=clip AllocNode:Sid=nexusclip00:25443 | ||
ReqNodeList=(null) ExcNodeList=(null) | ReqNodeList=(null) ExcNodeList=(null) | ||
NodeList=clip00 | NodeList=clip00 | ||
BatchHost=clip00 | BatchHost=clip00 | ||
NumNodes=1 NumCPUs= | NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:* | ||
TRES=cpu= | TRES=cpu=4,mem=8G,node=1,billing=2266 | ||
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* | Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* | ||
MinCPUsNode=1 MinMemoryNode= | MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0 | ||
Features=(null) DelayBoot=00:00:00 | Features=(null) DelayBoot=00:00:00 | ||
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) | OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) |
Latest revision as of 14:57, 24 April 2024
The Nexus scheduler houses CLIP's new computational partition. Only CLIP lab members are able to run non-interruptible jobs on these nodes.
Submission Nodes
You can SSH to nexusclip.umiacs.umd.edu
to log in to a submission host.
If you store something in a local directory (/tmp, /scratch0) on one of the two submission hosts, you will need to connect to that same submission host to access it later. The actual submission hosts are:
nexusclip00.umiacs.umd.edu
nexusclip01.umiacs.umd.edu
Resources
The CLIP partition has nodes brought over from the previous standalone CLIP Slurm scheduler as well as some more recent purchases. The compute nodes are named clip##
.
QoS
CLIP users have access to all of the standard job QoSes in the clip
partition using the clip
account.
The additional job QoSes for the CLIP partition specifically are:
huge-long
: Allows for longer jobs using higher overall resources.
Please note that the partition has a GrpTRES
limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
Jobs
You will need to specify --partition=clip
and --account=clip
to be able to submit jobs to the CLIP partition.
[username@nexusclip00:~ ] $ srun --pty --ntasks=4 --mem=8G --qos=default --partition=clip --account=clip --time 1-00:00:00 bash srun: job 218874 queued and waiting for resources srun: job 218874 has been allocated resources [username@clip00:~ ] $ scontrol show job 218874 JobId=218874 JobName=bash UserId=username(1000) GroupId=username(21000) MCS_label=N/A Priority=897 Nice=0 Account=clip QOS=default JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56 AccrueTime=2022-11-18T11:13:56 StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main Partition=clip AllocNode:Sid=nexusclip00:25443 ReqNodeList=(null) ExcNodeList=(null) NodeList=clip00 BatchHost=clip00 NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=4,mem=8G,node=1,billing=2266 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/nfshomes/username Power=
Storage
All data filesystems that were available in the standalone CLIP cluster are also available in Nexus.
CLIP users can also request Nexus project allocations.