Nexus/CLIP: Difference between revisions
No edit summary |
|||
Line 16: | Line 16: | ||
= Network = | = Network = | ||
The network infrastructure supporting the | The network infrastructure supporting the CLIP partition consists of: | ||
# One pair of network switches connected to each other via dual 25GbE links for redundancy, serving the following compute nodes: | # One pair of network switches connected to each other via dual 25GbE links for redundancy, serving the following compute nodes: | ||
#* clip[00-03,05,07-08,10]: Two 10GbE links per node, one to each switch in the pair (redundancy). | #* clip[00-03,05,07-08,10]: Two 10GbE links per node, one to each switch in the pair (redundancy). |
Latest revision as of 18:54, 3 December 2024
The previous standalone cluster for CLIP's compute nodes have folded into Nexus as of late 2022.
The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found here.
Please contact staff with any questions or concerns.
Submission Nodes
You can SSH to nexusclip.umiacs.umd.edu
to log in to a submission node.
If you store something in a local filesystem directory (/tmp, /scratch0) on one of the two submission nodes, you will need to connect to that same submission node to access it later. The actual submission nodes are:
nexusclip00.umiacs.umd.edu
nexusclip01.umiacs.umd.edu
Compute Nodes
The CLIP partition has nodes brought over from the previous standalone CLIP Slurm scheduler as well as some more recent purchases. The compute nodes are named clip##
.
Network
The network infrastructure supporting the CLIP partition consists of:
- One pair of network switches connected to each other via dual 25GbE links for redundancy, serving the following compute nodes:
- clip[00-03,05,07-08,10]: Two 10GbE links per node, one to each switch in the pair (redundancy).
- clip04: Two 40GbE links per node, one to each switch in the pair (redundancy).
- clip06: Two 25GbE links per node, one to each switch in the pair (redundancy).
- clip09: Two 1GbE links per node, one to each switch in the pair (redundancy).
- clip[11-13]: Two 100GbE links per node, one to each switch in the pair (redundancy).
For a broader overview of the network infrastructure supporting the Nexus cluster, please see Nexus/Network.
QoS
CLIP users have access to all of the standard job QoSes in the clip
partition using the clip
account.
The additional job QoSes for the CLIP partition specifically are:
huge-long
: Allows for longer jobs using higher overall resources.
Please note that the partition has a GrpTRES
limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
Jobs
You will need to specify --partition=clip
and --account=clip
to be able to submit jobs to the CLIP partition.
[username@nexusclip00:~ ] $ srun --pty --ntasks=4 --mem=8G --qos=default --partition=clip --account=clip --time 1-00:00:00 bash srun: job 218874 queued and waiting for resources srun: job 218874 has been allocated resources [username@clip00:~ ] $ scontrol show job 218874 JobId=218874 JobName=bash UserId=username(1000) GroupId=username(21000) MCS_label=N/A Priority=897 Nice=0 Account=clip QOS=default JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56 AccrueTime=2022-11-18T11:13:56 StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main Partition=clip AllocNode:Sid=nexusclip00:25443 ReqNodeList=(null) ExcNodeList=(null) NodeList=clip00 BatchHost=clip00 NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=4,mem=8G,node=1,billing=2266 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/nfshomes/username Power=
Storage
All data filesystems that were available in the standalone CLIP cluster are also available in Nexus.
CLIP users can also request Nexus project allocations.