Nexus/QuICS

From UMIACS
Jump to navigation Jump to search

The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found here.

Please contact staff with any questions or concerns.

Submission Nodes

You can SSH to nexusquics.umiacs.umd.edu to log in to a submission node.

If you store something in a local filesystem directory (/tmp, /scratch0) on one of the two submission nodes, you will need to connect to that same submission node to access it later. The actual submission nodes are:

  • nexusquics00.umiacs.umd.edu
  • nexusquics01.umiacs.umd.edu

Compute Nodes

The compute nodes are named legacy## and quics00.

quics00's specifications:

  • 128 CPU cores (dual AMD EPYC 9534)
  • ~1.5TB RAM (DDR5 4800MHz)
  • ~350GB SATA SSD storage located at /scratch0
  • ~7TB NVMe SSD storage located at /scratch1

QoS

QuICS users have access to all of the standard job QoSes in the QuICS partition using the quics account.

The additional jobs QoSes for the QuICS partition specifically are:

  • highmem: Allows for significantly increased memory to be allocated.
  • huge-long: Allows for longer jobs using higher overall resources.

Please note that the partition has a GrpTRES limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.

Jobs

You will need to specify --partition=quics and --account=quics to be able to submit jobs to the QuICS partition.

[username@nexusquics00 ~]$ srun --pty --ntasks=4 --mem=8G --qos=default --partition=quics --account=quics --time 1-00:00:00 bash
srun: job 218874 queued and waiting for resources
srun: job 218874 has been allocated resources
[username@legacy00 ~]$ scontrol show job 218874
JobId=218874 JobName=bash
   UserId=username(1000) GroupId=username(21000) MCS_label=N/A
   Priority=897 Nice=0 Account=quics QOS=default
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56
   AccrueTime=2022-11-18T11:13:56
   StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A
   PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
   Partition=quics AllocNode:Sid=nexusquics00:25443
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=legacy00
   BatchHost=legacy00
   NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=4,mem=8G,node=1,billing=2266
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/nfshomes/username
   Power=

Storage

QuICS users can request Nexus project allocations.