Nexus/QuICS: Difference between revisions

From UMIACS
Jump to navigation Jump to search
(Created page with "The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found here. Please contact staff with any questions or concerns. = Submission Nodes = You can SSH to <code>nexusquics.umiacs.umd.edu</code> to log in to a submission node. If you store something in a local filesystem directory (/...")
 
No edit summary
 
Line 12: Line 12:
= Compute Nodes =  
= Compute Nodes =  
The compute nodes are named <code>legacy##</code> and <code>quics00</code>.
The compute nodes are named <code>legacy##</code> and <code>quics00</code>.
quics00's specifications:
* 128 CPU cores (dual [https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9534.html AMD EPYC 9534])
* ~1.5TB RAM (DDR5 4800MHz)
* ~350GB SATA SSD storage located at /scratch0
* ~7TB NVMe SSD storage located at /scratch1


= QoS =  
= QoS =  
Line 23: Line 29:


= Jobs =
= Jobs =
You will need to specify <code>--partition=quics</code> and <code>--account=quics</code> to be able to submit jobs to the QuICS partition.  
You will need to specify <code>--partition=quics</code> and <code>--account=quics</code> to be able to submit jobs to the QuICS partition.
 
<pre>
[username@nexusquics00 ~]$ srun --pty --ntasks=4 --mem=8G --qos=default --partition=quics --account=quics --time 1-00:00:00 bash
srun: job 218874 queued and waiting for resources
srun: job 218874 has been allocated resources
[username@legacy00 ~]$ scontrol show job 218874
JobId=218874 JobName=bash
  UserId=username(1000) GroupId=username(21000) MCS_label=N/A
  Priority=897 Nice=0 Account=quics QOS=default
  JobState=RUNNING Reason=None Dependency=(null)
  Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
  RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A
  SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56
  AccrueTime=2022-11-18T11:13:56
  StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A
  PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
  SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
  Partition=quics AllocNode:Sid=nexusquics00:25443
  ReqNodeList=(null) ExcNodeList=(null)
  NodeList=legacy00
  BatchHost=legacy00
  NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
  TRES=cpu=4,mem=8G,node=1,billing=2266
  Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
  MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0
  Features=(null) DelayBoot=00:00:00
  OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
  Command=bash
  WorkDir=/nfshomes/username
  Power=
</pre>


= Storage =
= Storage =
QuICS users can request [[Nexus#Project_Allocations | Nexus project allocations]].
QuICS users can request [[Nexus#Project_Allocations | Nexus project allocations]].

Latest revision as of 17:06, 23 May 2025

The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found here.

Please contact staff with any questions or concerns.

Submission Nodes

You can SSH to nexusquics.umiacs.umd.edu to log in to a submission node.

If you store something in a local filesystem directory (/tmp, /scratch0) on one of the two submission nodes, you will need to connect to that same submission node to access it later. The actual submission nodes are:

  • nexusquics00.umiacs.umd.edu
  • nexusquics01.umiacs.umd.edu

Compute Nodes

The compute nodes are named legacy## and quics00.

quics00's specifications:

  • 128 CPU cores (dual AMD EPYC 9534)
  • ~1.5TB RAM (DDR5 4800MHz)
  • ~350GB SATA SSD storage located at /scratch0
  • ~7TB NVMe SSD storage located at /scratch1

QoS

QuICS users have access to all of the standard job QoSes in the QuICS partition using the quics account.

The additional jobs QoSes for the QuICS partition specifically are:

  • highmem: Allows for significantly increased memory to be allocated.
  • huge-long: Allows for longer jobs using higher overall resources.

Please note that the partition has a GrpTRES limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.

Jobs

You will need to specify --partition=quics and --account=quics to be able to submit jobs to the QuICS partition.

[username@nexusquics00 ~]$ srun --pty --ntasks=4 --mem=8G --qos=default --partition=quics --account=quics --time 1-00:00:00 bash
srun: job 218874 queued and waiting for resources
srun: job 218874 has been allocated resources
[username@legacy00 ~]$ scontrol show job 218874
JobId=218874 JobName=bash
   UserId=username(1000) GroupId=username(21000) MCS_label=N/A
   Priority=897 Nice=0 Account=quics QOS=default
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56
   AccrueTime=2022-11-18T11:13:56
   StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A
   PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
   Partition=quics AllocNode:Sid=nexusquics00:25443
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=legacy00
   BatchHost=legacy00
   NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=4,mem=8G,node=1,billing=2266
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/nfshomes/username
   Power=

Storage

QuICS users can request Nexus project allocations.