Nexus/MBRC: Difference between revisions
No edit summary |
|||
(10 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
The | The compute nodes from [https://mbrc.umd.edu MBRC]'s previous standalone cluster have folded into [[Nexus]] as of mid 2023. | ||
The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found [[Nexus/Tron | here]]. | |||
Please [[HelpDesk | contact staff]] with any questions or concerns. | |||
= Submission Nodes = | = Submission Nodes = | ||
You can [[SSH]] to <code>nexusmbrc.umiacs.umd.edu</code> to log in to a submission node. | |||
If you store something in a local filesystem directory (/tmp, /scratch0) on one of the two submission nodes, you will need to connect to that same submission node to access it later. The actual submission nodes are: | |||
* <code>nexusmbrc00.umiacs.umd.edu</code> | * <code>nexusmbrc00.umiacs.umd.edu</code> | ||
* <code>nexusmbrc01.umiacs.umd.edu</code> | * <code>nexusmbrc01.umiacs.umd.edu</code> | ||
= | = Compute Nodes = | ||
The MBRC partition has nodes brought over from the previous standalone MBRC Slurm scheduler. The compute nodes are named <code>mbrc##</code>. | The MBRC partition has nodes brought over from the previous standalone MBRC Slurm scheduler. The compute nodes are named <code>mbrc##</code>. | ||
= Network = | |||
The network infrastructure supporting the MBRC partition consists of: | |||
# One pair of network switches connected to each other via dual 25GbE links for redundancy, serving the following compute nodes: | |||
#* mbrc[00-01]: Two 25GbE links per node, one to each switch in the pair (redundancy). | |||
For a broader overview of the network infrastructure supporting the Nexus cluster, please see [[Nexus/Network]]. | |||
= QoS = | = QoS = | ||
Line 16: | Line 28: | ||
* <code>huge-long</code>: Allows for longer jobs using higher overall resources. | * <code>huge-long</code>: Allows for longer jobs using higher overall resources. | ||
Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use. | Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use. | ||
= Jobs = | = Jobs = | ||
You will need to specify <code>--partition=mbrc</code> | You will need to specify <code>--partition=mbrc</code> and <code>--account=mbrc</code> to be able to submit jobs to the MBRC partition. | ||
<pre> | <pre> |
Latest revision as of 18:52, 2 December 2024
The compute nodes from MBRC's previous standalone cluster have folded into Nexus as of mid 2023.
The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found here.
Please contact staff with any questions or concerns.
Submission Nodes
You can SSH to nexusmbrc.umiacs.umd.edu
to log in to a submission node.
If you store something in a local filesystem directory (/tmp, /scratch0) on one of the two submission nodes, you will need to connect to that same submission node to access it later. The actual submission nodes are:
nexusmbrc00.umiacs.umd.edu
nexusmbrc01.umiacs.umd.edu
Compute Nodes
The MBRC partition has nodes brought over from the previous standalone MBRC Slurm scheduler. The compute nodes are named mbrc##
.
Network
The network infrastructure supporting the MBRC partition consists of:
- One pair of network switches connected to each other via dual 25GbE links for redundancy, serving the following compute nodes:
- mbrc[00-01]: Two 25GbE links per node, one to each switch in the pair (redundancy).
For a broader overview of the network infrastructure supporting the Nexus cluster, please see Nexus/Network.
QoS
MBRC users have access to all of the standard job QoSes in the mbrc
partition using the mbrc
account.
The additional job QoSes for the MBRC partition specifically are:
huge-long
: Allows for longer jobs using higher overall resources.
Please note that the partition has a GrpTRES
limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
Jobs
You will need to specify --partition=mbrc
and --account=mbrc
to be able to submit jobs to the MBRC partition.
[username@nexusmbrc00:~ ] $ srun --pty --ntasks=4 --mem=8G --qos=default --partition=mbrc --account=mbrc --time 1-00:00:00 bash srun: job 218874 queued and waiting for resources srun: job 218874 has been allocated resources [username@mbrc00:~ ] $ scontrol show job 218874 JobId=218874 JobName=bash UserId=username(1000) GroupId=username(21000) MCS_label=N/A Priority=897 Nice=0 Account=mbrc QOS=default JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56 AccrueTime=2022-11-18T11:13:56 StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main Partition=mbrc AllocNode:Sid=nexusmbrc00:25443 ReqNodeList=(null) ExcNodeList=(null) NodeList=mbrc00 BatchHost=mbrc00 NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=4,mem=8G,node=1,billing=2266 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/nfshomes/username Power=
Storage
In addition to storage types available to all Nexus users, MBRC users can also request MBRC project directories.
Project Directories
For this cluster we have decided to allocate network storage on a project by project basis. Jonathan Heagerty will be the point of contact as it pertains to allocating the requested/required storage for each project. As a whole, the MBRC Cluster has limited network storage and for this there will be limits to how much and how long network storage can be appropriated.
If the requested storage size is significantly large relative to the total allotted amount, the request will be relayed from Jonathan Heagerty to the MBRC Cluster faculty for approval. Two other situations that would need approval from the MBRC Cluster faculty would be: To request an increase to a projects current storage allotment or To request a time extension for a projects storage.
When making a request for storage please provide the following information when contacting staff:
- Name of user requesting storage: Example: jheager2 - Name of project: Example: Foveated Rendering - Collaborators working on the project: Example: Sida Li - Storage size: Example: 1TB - Length of time for storage: Example: 6-8 months