Nexus/CLIP: Difference between revisions

From UMIACS
Jump to navigation Jump to search
(Created page with "stub")
 
No edit summary
 
(55 intermediate revisions by 2 users not shown)
Line 1: Line 1:
stub
The previous standalone cluster for [https://wiki.umiacs.umd.edu/clip/index.php/Main_Page CLIP]'s compute nodes have folded into [[Nexus]] as of late 2022.
 
The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found [[Nexus/Tron | here]].
 
Please [[HelpDesk | contact staff]] with any questions or concerns.
 
= Submission Nodes =
You can [[SSH]] to <code>nexusclip.umiacs.umd.edu</code> to log in to a submission node.
 
If you store something in a local filesystem directory (/tmp, /scratch0) on one of the two submission nodes, you will need to connect to that same submission node to access it later. The actual submission nodes are:
* <code>nexusclip00.umiacs.umd.edu</code>
* <code>nexusclip01.umiacs.umd.edu</code>
 
= Resources =
The CLIP partition has nodes brought over from the previous standalone CLIP Slurm scheduler as well as some more recent purchases. The compute nodes are named <code>clip##</code>.
 
= QoS =
CLIP users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard job QoSes]] in the <code>clip</code> partition using the <code>clip</code> account.
 
The additional job QoSes for the CLIP partition specifically are:
* <code>huge-long</code>: Allows for longer jobs using higher overall resources.
 
Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
 
= Jobs =
You will need to specify <code>--partition=clip</code> and <code>--account=clip</code> to be able to submit jobs to the CLIP partition.
 
<pre>
[username@nexusclip00:~ ] $ srun --pty --ntasks=4 --mem=8G --qos=default --partition=clip --account=clip --time 1-00:00:00 bash
srun: job 218874 queued and waiting for resources
srun: job 218874 has been allocated resources
[username@clip00:~ ] $ scontrol show job 218874
JobId=218874 JobName=bash
  UserId=username(1000) GroupId=username(21000) MCS_label=N/A
  Priority=897 Nice=0 Account=clip QOS=default
  JobState=RUNNING Reason=None Dependency=(null)
  Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
  RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A
  SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56
  AccrueTime=2022-11-18T11:13:56
  StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A
  PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
  SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
  Partition=clip AllocNode:Sid=nexusclip00:25443
  ReqNodeList=(null) ExcNodeList=(null)
  NodeList=clip00
  BatchHost=clip00
  NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
  TRES=cpu=4,mem=8G,node=1,billing=2266
  Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
  MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0
  Features=(null) DelayBoot=00:00:00
  OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
  Command=bash
  WorkDir=/nfshomes/username
  Power=
</pre>
 
= Storage =
All data filesystems that were available in the standalone CLIP cluster are also available in Nexus.
 
CLIP users can also request [[Nexus#Project_Allocations | Nexus project allocations]].

Latest revision as of 17:44, 22 November 2024

The previous standalone cluster for CLIP's compute nodes have folded into Nexus as of late 2022.

The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found here.

Please contact staff with any questions or concerns.

Submission Nodes

You can SSH to nexusclip.umiacs.umd.edu to log in to a submission node.

If you store something in a local filesystem directory (/tmp, /scratch0) on one of the two submission nodes, you will need to connect to that same submission node to access it later. The actual submission nodes are:

  • nexusclip00.umiacs.umd.edu
  • nexusclip01.umiacs.umd.edu

Resources

The CLIP partition has nodes brought over from the previous standalone CLIP Slurm scheduler as well as some more recent purchases. The compute nodes are named clip##.

QoS

CLIP users have access to all of the standard job QoSes in the clip partition using the clip account.

The additional job QoSes for the CLIP partition specifically are:

  • huge-long: Allows for longer jobs using higher overall resources.

Please note that the partition has a GrpTRES limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.

Jobs

You will need to specify --partition=clip and --account=clip to be able to submit jobs to the CLIP partition.

[username@nexusclip00:~ ] $ srun --pty --ntasks=4 --mem=8G --qos=default --partition=clip --account=clip --time 1-00:00:00 bash
srun: job 218874 queued and waiting for resources
srun: job 218874 has been allocated resources
[username@clip00:~ ] $ scontrol show job 218874
JobId=218874 JobName=bash
   UserId=username(1000) GroupId=username(21000) MCS_label=N/A
   Priority=897 Nice=0 Account=clip QOS=default
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56
   AccrueTime=2022-11-18T11:13:56
   StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A
   PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
   Partition=clip AllocNode:Sid=nexusclip00:25443
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=clip00
   BatchHost=clip00
   NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=4,mem=8G,node=1,billing=2266
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/nfshomes/username
   Power=

Storage

All data filesystems that were available in the standalone CLIP cluster are also available in Nexus.

CLIP users can also request Nexus project allocations.