Nexus/CBCB: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
The [[Nexus]] scheduler houses [[CBCB]]'s new computational partition.
The [[Nexus]] scheduler houses [[CBCB]]'s new computational partition. Only CBCB lab members are able to run non-interruptible jobs on these nodes.


= Submission Nodes =
= Submission Nodes =
There are two submission nodes for Nexus exclusively available for CBCB users.
You can [[SSH]] to <code>nexuscbcb.umiacs.umd.edu</code> to log in to a submission host.


If you store something in a local directory (/tmp, /scratch0) on one of the two submission hosts, you will need to connect to that same submission host to access it later. The actual submission hosts are:
* <code>nexuscbcb00.umiacs.umd.edu</code>
* <code>nexuscbcb00.umiacs.umd.edu</code>
* <code>nexuscbcb01.umiacs.umd.edu</code>
* <code>nexuscbcb01.umiacs.umd.edu</code>
Line 10: Line 11:
The new CBCB partition has 22 new nodes with 32 [https://www.amd.com/en/products/cpu/amd-epyc-7313 AMD EPYC-7313] cores and 2000GB of memory each.  CBCB users can also submit jobs and access resources such as GPUs in other partitions in [[Nexus]].
The new CBCB partition has 22 new nodes with 32 [https://www.amd.com/en/products/cpu/amd-epyc-7313 AMD EPYC-7313] cores and 2000GB of memory each.  CBCB users can also submit jobs and access resources such as GPUs in other partitions in [[Nexus]].


Some of the newer nodes from the [https://wiki.umiacs.umd.edu/cbcb-private/index.php/Slurm standalone CBCB cluster] (albeit still several years old) will also move into this partition beginning on Tuesday, May 30th, 2023. The moves will complete within the business week.
Some of the newer nodes from the [https://wiki.umiacs.umd.edu/cbcb-private/index.php/Slurm standalone CBCB cluster] (albeit still several years old) have also moved into this partition as of Summer 2023, with a few additional faculty investments since then.
 
= Partitions =
There is only one partition available to general CBCB [[SLURM]] users. You must specify this partition when submitting your job.
 
* '''cbcb''' - This is the default partition. Job allocations on all nodes except those also in the below-mentioned partition are guaranteed.
 
There is one additional partition available solely to Dr. Heng Huang's sponsored accounts.
 
* '''cbcb-heng''' - This partition is for exclusive priority access to Dr. Huang's purchased GPU nodes. Job allocations are guaranteed.


= QoS =  
= QoS =  
CBCB users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard QoS']] in the <code>cbcb</code> partition using the <code>cbcb</code> account. There are two additional QoSes: <code>huge-long</code> that allows for longer jobs using higher overall resources, and <code>highmem</code> that allows significantly increased memory to be allocated.
CBCB users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard job QoSes]] in the <code>cbcb</code> partition using the <code>cbcb</code> account.  
 
The additional job QoSes for the CBCB partition specifically are:
* <code>highmem</code>: Allows for significantly increased memory to be allocated.
* <code>huge-long</code>: Allows for longer jobs using higher overall resources.


Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.


= Jobs =
= Jobs =
You will need to specify <code>--partition=cbcb</code>, <code>--account=cbcb</code>, and a specific <code>--qos</code> to be able to submit jobs to the CBCB partition.  
You will need to specify <code>--partition=cbcb</code> and <code>--account=cbcb</code> to be able to submit jobs to the CBCB partition.  


<pre>
<pre>
Line 59: Line 73:


== Home Directories ==
== Home Directories ==
The [[Nexus]] runs on our [[NFShomes]] home directories and not <code>/cbcbhomes/$USERNAME</code>.  As part of the process of migrating into Nexus, you may want (or need) to copy any shell customization from your existing <code>/cbcbhomes</code> to your new home directory.  To make this transition easier, <code>/cbcbhomes</code> is available to the Nexus CBCB submission nodes.
The [[Nexus]] uses [[NFShomes]] home directories and not <code>/cbcbhomes/$USERNAME</code>.  As part of the process of migrating into Nexus, you may want (or need) to copy any shell customization from your existing <code>/cbcbhomes</code> to your <code>/nfshomes</code>.  To make this transition easier, <code>/cbcbhomes</code> is available to the Nexus CBCB submission nodes.


== Operating System / Software ==
== Operating System / Software ==
CBCB's standalone cluster submission and compute nodes are running RHEL7.  [[Nexus]] is exclusively running RHEL8, so any software you may have compiled may need to be re-compiled to work correctly in this new environment.  The [https://wiki.umiacs.umd.edu/cbcb/index.php/CBCB_Software_Modules CBCB module tree] for RHEL8 may not yet be fully populated with RHEL8 software.  If you do not see the modules you need, please reach out to the [https://wiki.umiacs.umd.edu/cbcb/index.php/CBCB_Software_Modules#Contact CBCB software maintainers].
CBCB's standalone cluster submission and compute nodes are running RHEL7.  [[Nexus]] is exclusively running RHEL8, so any software you may have compiled may need to be re-compiled to work correctly in this new environment.  The [https://wiki.umiacs.umd.edu/cbcb/index.php/CBCB_Software_Modules CBCB module tree] for RHEL8 may not yet be fully populated with RHEL8 software.  If you do not see the modules you need, please reach out to the [https://wiki.umiacs.umd.edu/cbcb/index.php/CBCB_Software_Modules#Contact CBCB software maintainers].

Latest revision as of 14:56, 24 April 2024

The Nexus scheduler houses CBCB's new computational partition. Only CBCB lab members are able to run non-interruptible jobs on these nodes.

Submission Nodes

You can SSH to nexuscbcb.umiacs.umd.edu to log in to a submission host.

If you store something in a local directory (/tmp, /scratch0) on one of the two submission hosts, you will need to connect to that same submission host to access it later. The actual submission hosts are:

  • nexuscbcb00.umiacs.umd.edu
  • nexuscbcb01.umiacs.umd.edu

Resources

The new CBCB partition has 22 new nodes with 32 AMD EPYC-7313 cores and 2000GB of memory each. CBCB users can also submit jobs and access resources such as GPUs in other partitions in Nexus.

Some of the newer nodes from the standalone CBCB cluster (albeit still several years old) have also moved into this partition as of Summer 2023, with a few additional faculty investments since then.

Partitions

There is only one partition available to general CBCB SLURM users. You must specify this partition when submitting your job.

  • cbcb - This is the default partition. Job allocations on all nodes except those also in the below-mentioned partition are guaranteed.

There is one additional partition available solely to Dr. Heng Huang's sponsored accounts.

  • cbcb-heng - This partition is for exclusive priority access to Dr. Huang's purchased GPU nodes. Job allocations are guaranteed.

QoS

CBCB users have access to all of the standard job QoSes in the cbcb partition using the cbcb account.

The additional job QoSes for the CBCB partition specifically are:

  • highmem: Allows for significantly increased memory to be allocated.
  • huge-long: Allows for longer jobs using higher overall resources.

Please note that the partition has a GrpTRES limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.

Jobs

You will need to specify --partition=cbcb and --account=cbcb to be able to submit jobs to the CBCB partition.

[username@nexuscbcb00:~ ] $ srun --pty --ntasks=16 --mem=2000G --qos=highmem --partition=cbcb --account=cbcb --time 1-00:00:00 bash
srun: job 218874 queued and waiting for resources
srun: job 218874 has been allocated resources
[username@cbcb00:~ ] $ scontrol show job 218874
JobId=218874 JobName=bash
   UserId=username(1000) GroupId=username(21000) MCS_label=N/A
   Priority=897 Nice=0 Account=cbcb QOS=highmem
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56
   AccrueTime=2022-11-18T11:13:56
   StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A
   PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
   Partition=cbcb AllocNode:Sid=nexuscbcb00:25443
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=cbcb00
   BatchHost=cbcb00
   NumNodes=1 NumCPUs=16 NumTasks=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=16,mem=2000G,node=1,billing=2266
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=2000G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/nfshomes/username
   Power=

Storage

CBCB still has its current storage allocation in place. All data filesystems that are available in the standalone CBCB cluster are also available in Nexus. Please note about the change in your home directory in the migration section below.

CBCB users can also request Nexus project allocations.

Migration

Home Directories

The Nexus uses NFShomes home directories and not /cbcbhomes/$USERNAME. As part of the process of migrating into Nexus, you may want (or need) to copy any shell customization from your existing /cbcbhomes to your /nfshomes. To make this transition easier, /cbcbhomes is available to the Nexus CBCB submission nodes.

Operating System / Software

CBCB's standalone cluster submission and compute nodes are running RHEL7. Nexus is exclusively running RHEL8, so any software you may have compiled may need to be re-compiled to work correctly in this new environment. The CBCB module tree for RHEL8 may not yet be fully populated with RHEL8 software. If you do not see the modules you need, please reach out to the CBCB software maintainers.