Nexus/MBRC: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
The [[Nexus]] scheduler houses [https://mbrc.umd.edu/ MBRC]'s new computational partition.
The [[Nexus]] scheduler houses [https://mbrc.umd.edu/ MBRC]'s new computational partition. Only MBRC lab members are able to run non-interruptible jobs on these nodes.


= Submission Nodes =
= Submission Nodes =
There are two submission nodes for Nexus exclusively available for MBRC users.
You can [[SSH]] to <code>nexusmbrc.umiacs.umd.edu</code> to log in to a submission host.


If you store something in a local directory (/tmp, /scratch0) on one of the two submission hosts, you will need to connect to that same submission host to access it later. The actual submission hosts are:
* <code>nexusmbrc00.umiacs.umd.edu</code>
* <code>nexusmbrc00.umiacs.umd.edu</code>
* <code>nexusmbrc01.umiacs.umd.edu</code>
* <code>nexusmbrc01.umiacs.umd.edu</code>
Line 11: Line 12:


= QoS =  
= QoS =  
MBRC users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard QoS']] in the <code>mbrc</code> partition using the <code>mbrc</code> account. There is one additional QoS: <code>huge-long</code> that allows for longer jobs using higher overall resources.
MBRC users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard job QoSes]] in the <code>mbrc</code> partition using the <code>mbrc</code> account.


Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
The additional job QoSes for the MBRC partition specifically are:
* <code>huge-long</code>: Allows for longer jobs using higher overall resources.
 
Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.


= Jobs =
= Jobs =
You will need to specify <code>--partition=mbrc</code>, <code>--account=mbrc</code>, and a specific <code>--qos</code> to be able to submit jobs to the MBRC partition.  
You will need to specify <code>--partition=mbrc</code> and <code>--account=mbrc</code> to be able to submit jobs to the MBRC partition.  


<pre>
<pre>
Line 34: Line 38:
   PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
   PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
   Partition=mbrc AllocNode:Sid=nexuscbcb00:25443
   Partition=mbrc AllocNode:Sid=nexusmbrc00:25443
   ReqNodeList=(null) ExcNodeList=(null)
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=mbrc00
   NodeList=mbrc00
   BatchHost=mbrc00
   BatchHost=mbrc00
   NumNodes=1 NumCPUs=16 NumTasks=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=16,mem=2000G,node=1,billing=2266
   TRES=cpu=4,mem=8G,node=1,billing=2266
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=2000G MinTmpDiskNode=0
   MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Line 50: Line 54:


= Storage =
= Storage =
All data filesystems that were available in the standalone MBRC cluster are also available in Nexus.
In addition to [[Nexus#Storage | storage types available to all Nexus users]], MBRC users can also request MBRC project directories.
 
== Project Directories ==
For this cluster we have decided to allocate network storage on a project by project basis. Jonathan Heagerty will be the point of contact as it pertains to allocating the requested/required storage for each project. As a whole, the MBRC Cluster has limited network storage and for this there will be limits to how much and how long network storage can be appropriated.
 
If the requested storage size is significantly large relative to the total allotted amount, the request will be relayed from Jonathan Heagerty to the MBRC Cluster faculty for approval. Two other situations that would need approval from the MBRC Cluster faculty would be: To request an increase to a projects current storage allotment or To request a time extension for a projects storage.


MBRC users can also request [[Nexus#Project_Allocations | Nexus project allocations]].
When making a request for storage please provide the following information when [[HelpDesk | contacting staff]]:
        - Name of user requesting storage:
                Example: jheager2
        - Name of project:
                Example: Foveated Rendering
        - Collaborators working on the project:
                Example: Sida Li
        - Storage size:
                Example: 1TB
        - Length of time for storage:
                Example: 6-8 months

Latest revision as of 14:57, 24 April 2024

The Nexus scheduler houses MBRC's new computational partition. Only MBRC lab members are able to run non-interruptible jobs on these nodes.

Submission Nodes

You can SSH to nexusmbrc.umiacs.umd.edu to log in to a submission host.

If you store something in a local directory (/tmp, /scratch0) on one of the two submission hosts, you will need to connect to that same submission host to access it later. The actual submission hosts are:

  • nexusmbrc00.umiacs.umd.edu
  • nexusmbrc01.umiacs.umd.edu

Resources

The MBRC partition has nodes brought over from the previous standalone MBRC Slurm scheduler. The compute nodes are named mbrc##.

QoS

MBRC users have access to all of the standard job QoSes in the mbrc partition using the mbrc account.

The additional job QoSes for the MBRC partition specifically are:

  • huge-long: Allows for longer jobs using higher overall resources.

Please note that the partition has a GrpTRES limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.

Jobs

You will need to specify --partition=mbrc and --account=mbrc to be able to submit jobs to the MBRC partition.

[username@nexusmbrc00:~ ] $ srun --pty --ntasks=4 --mem=8G --qos=default --partition=mbrc --account=mbrc --time 1-00:00:00 bash
srun: job 218874 queued and waiting for resources
srun: job 218874 has been allocated resources
[username@mbrc00:~ ] $ scontrol show job 218874
JobId=218874 JobName=bash
   UserId=username(1000) GroupId=username(21000) MCS_label=N/A
   Priority=897 Nice=0 Account=mbrc QOS=default
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56
   AccrueTime=2022-11-18T11:13:56
   StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A
   PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
   Partition=mbrc AllocNode:Sid=nexusmbrc00:25443
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=mbrc00
   BatchHost=mbrc00
   NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=4,mem=8G,node=1,billing=2266
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/nfshomes/username
   Power=

Storage

In addition to storage types available to all Nexus users, MBRC users can also request MBRC project directories.

Project Directories

For this cluster we have decided to allocate network storage on a project by project basis. Jonathan Heagerty will be the point of contact as it pertains to allocating the requested/required storage for each project. As a whole, the MBRC Cluster has limited network storage and for this there will be limits to how much and how long network storage can be appropriated.

If the requested storage size is significantly large relative to the total allotted amount, the request will be relayed from Jonathan Heagerty to the MBRC Cluster faculty for approval. Two other situations that would need approval from the MBRC Cluster faculty would be: To request an increase to a projects current storage allotment or To request a time extension for a projects storage.

When making a request for storage please provide the following information when contacting staff:

       - Name of user requesting storage:
               Example: jheager2
       - Name of project:
               Example: Foveated Rendering
       - Collaborators working on the project:
               Example: Sida Li
       - Storage size:
               Example: 1TB
       - Length of time for storage:
               Example: 6-8 months