Nexus/CBCB: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
 
(60 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The [[Nexus]] computational resources and scheduler house the CBCB's new computational partition.
The compute nodes from [[CBCB]]'s previous standalone cluster have folded into [[Nexus]] as of mid 2023.
 
The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found [[Nexus/Tron | here]].
 
Please [[HelpDesk | contact staff]] with any questions or concerns.


= Submission Nodes =
= Submission Nodes =
There are two submission nodes for Nexus exclusively available for CBCB users.
You can [[SSH]] to <code>nexuscbcb.umiacs.umd.edu</code> to log in to a submission node.


If you store something in a local filesystem directory (/tmp, /scratch0) on one of the two submission nodes, you will need to connect to that same submission node to access it later. The actual submission nodes are:
* <code>nexuscbcb00.umiacs.umd.edu</code>
* <code>nexuscbcb00.umiacs.umd.edu</code>
* <code>nexuscbcb01.umiacs.umd.edu</code>
* <code>nexuscbcb01.umiacs.umd.edu</code>


= Resources =  
= Nodes =  
The new CBCB partition has 22 new nodes with 32 AMD EPYC-7313 cores and 2000GB of memory each.  CBCB users also has access to submitting jobs and accessing resources like GPUs in other partitions in [[Nexus]].
All nodes in CBCB-owned partitions (see below section) owned by CBCB faculty are named in the format <code>cbcb##</code>. The sets of nodes are:
* 22 nodes that were purchased in October 2022 with center-wide fundingThey are cbcb[00-21].
* 4 nodes from the previous standalone CBCB cluster that moved in as of Summer 2023.  They are cbcb[22-25].
* A few additional nodes purchased by Dr. Heng Huang since then.  They are all remaining 'cbcb' named nodes.


= QoS =  
{| class="wikitable sortable"
Currently CBCB users have access to all the default QoS in the cbcb partition using the cbcb account however there is one additional QoS called <code>highmem</code> that allows significantly increased memory to be allocated.
! Nodenames
! Quantity
! CPU cores per node (CPUs)
! Memory per node (type)
! Filesystem storage per node (type/location)
! GPUs per node (type)
|-
|cbcb[00-21]
|22
|32 (Dual [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-7313.html AMD EPYC 7313])
|~2TB (DDR4 3200MHz)
|~350GB (SATA SSD [[FilesystemDataStorage#UNIX_Filesystem_Storage | /scratch0]]), ~2TB (NVMe SSD [[FilesystemDataStorage#UNIX_Filesystem_Storage | /scratch1]])
|0
|-
|cbcb22
|1
|28 (Dual [https://ark.intel.com/content/www/us/en/ark/products/91754/intel-xeon-processor-e5-2680-v4-35m-cache-2-40-ghz.html Intel Xeon E5-2680 v4])
|~768GB (DDR4 2400MHz)
|~650GB (SATA SSD [[FilesystemDataStorage#UNIX_Filesystem_Storage | /scratch0]])
|0
|-
|cbcb[23-24]
|2
|24 (Dual [https://www.intel.com/content/www/us/en/products/sku/91767/intel-xeon-processor-e52650-v4-30m-cache-2-20-ghz/specifications.html Intel Xeon E5-2650 v4])
|~256GB (DDR4 2400MHz)
|~800GB (SATA SSD [[FilesystemDataStorage#UNIX_Filesystem_Storage | /scratch0]])
|0
|-
|cbcb25
|1
|24 (Dual [https://www.intel.com/content/www/us/en/products/sku/91767/intel-xeon-processor-e52650-v4-30m-cache-2-20-ghz/specifications.html Intel Xeon E5-2650 v4])
|~256GB (DDR4 2400MHz)
|~1.4TB (SATA SSD [[FilesystemDataStorage#UNIX_Filesystem_Storage | /scratch0]])
|2 (1x [https://www.nvidia.com/en-gb/geforce/graphics-cards/geforce-gtx-1080-ti/specifications/ NVIDIA GeForce GTX 1080 Ti], 1x [https://www.nvidia.com/en-us/geforce/graphics-cards/compare/?section=compare-20 NVIDIA GeForce RTX 2080 Ti])
|-
|cbcb26
|1
|128 (Dual [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-7763.html AMD EPYC 7763])
|~512GB (DDR4 3200MHz)
|~3.4TB (NVMe SSD [[FilesystemDataStorage#UNIX_Filesystem_Storage | /scratch0]]), ~14TB (NVMe SSD [[FilesystemDataStorage#UNIX_Filesystem_Storage | /scratch1]])
|7 ([https://www.nvidia.com/en-us/design-visualization/rtx-a5000 NVIDIA RTX A5000])
|-
|cbcb27
|1
|64 (Dual [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-7513.html AMD EPYC 7513])
|~256GB (DDR4 3200MHz)
|~3.4TB (SATA SSD [[FilesystemDataStorage#UNIX_Filesystem_Storage | /scratch0]]), ~3.5TB (NVMe SSD [[FilesystemDataStorage#UNIX_Filesystem_Storage | /scratch1]])
|8 ([https://www.nvidia.com/en-us/design-visualization/rtx-a6000 NVIDIA RTX A6000])
|-
|cbcb[28-29]
|2
|32 (Dual [https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9124.html AMD EPYC 9124])
|~768GB (DDR5 4800MHz)
|~350GB (SATA SSD [[FilesystemDataStorage#UNIX_Filesystem_Storage | /scratch0]]), ~7TB (NVMe SSD [[FilesystemDataStorage#UNIX_Filesystem_Storage | /scratch1]])
|8 ([https://www.nvidia.com/en-us/design-visualization/rtx-6000 NVIDIA RTX 6000 Ada Generation])
|- class="sortbottom"
!Total
|30
|1060 (various)
|~49TB (various)
|~94TB (various)
|33 (various)
|}


Here is the listing of nodes as shown by the Slurm alias <code>show_nodes</code> (again, all nodes are named in the format <code>cbcb##</code>):
<pre>
<pre>
$ show_qos
[root@nexusctl00 ~]# show_nodes | grep cbcb
        Name     MaxWall MaxJobs                        MaxTRES                      MaxTRESPU              GrpTRES
NODELIST            CPUS      MEMORY     AVAIL_FEATURES                GRES                            STATE
------------ ----------- ------- ------------------------------ ------------------------------ --------------------
cbcb00              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
      normal
cbcb01              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
   scavenger  2-00:00:00            cpu=64,gres/gpu=8,mem=256G  cpu=192,gres/gpu=24,mem=768G
cbcb02              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
      medium  2-00:00:00               cpu=8,gres/gpu=2,mem=64G
cbcb03              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
         high  1-00:00:00            cpu=16,gres/gpu=4,mem=128G
cbcb04              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
    default  3-00:00:00               cpu=4,gres/gpu=1,mem=32G
cbcb05              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
         tron                                                        cpu=32,gres/gpu=4,mem=256G
cbcb06              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
  huge-long 10-00:00:00            cpu=32,gres/gpu=8,mem=256G
cbcb07              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
         clip                                                                                      cpu=339,mem=2926G
cbcb08              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
      class                                                        cpu=32,gres/gpu=4,mem=256G
cbcb09              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
       gamma                                                                                      cpu=179,mem=1511G
cbcb10              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
        mc2                                                                                      cpu=307,mem=1896G
cbcb11              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
        cbcb                                                                                    cpu=913,mem=46931G
cbcb12              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
    highmem 21-00:00:00                      cpu=32,mem=2000G
cbcb13              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
cbcb14              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
cbcb15              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
cbcb16              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
cbcb17              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
cbcb18              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
cbcb19              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
cbcb20              32        2061175    rhel8,Zen,EPYC-7313            (null)                          idle
cbcb21              32        2061175   rhel8,Zen,EPYC-7313            (null)                          idle
cbcb22              28        771245    rhel8,Xeon,E5-2680            (null)                          idle
cbcb23              24        255150    rhel8,Xeon,E5-2650            (null)                          idle
cbcb24               24        255150    rhel8,Xeon,E5-2650            (null)                          idle
cbcb25              24         255278    rhel8,Xeon,E5-2650,Pascal,Turi gpu:rtx2080ti:1,gpu:gtx1080ti:1  idle
cbcb26               128        513243    rhel8,Zen,EPYC-7763,Ampere    gpu:rtxa5000:7                  idle
cbcb27              64         255167    rhel8,Zen,EPYC-7513,Ampere    gpu:rtxa6000:8                   idle
cbcb28              32         771166    rhel8,Zen,EPYC-9124,Ada       gpu:rtx6000ada:8                idle
cbcb29              32        771166    rhel8,Zen,EPYC-9124,Ada        gpu:rtx6000ada:8                idle
</pre>
</pre>
= Partitions =
There are two partitions available to general CBCB [[SLURM]] users. You must specify one of these two partitions when submitting your job.
* '''cbcb''' - This is the default partition. Job allocations on all nodes except those also in the '''cbcb-heng''' partition are guaranteed.
* '''cbcb-interactive''' - This is a partition that only allows interactive jobs; you cannot submit jobs via <code>sbatch</code> to this partition. Job allocations are guaranteed.
There is one additional partition available solely to Dr. Heng Huang's sponsored accounts.
* '''cbcb-heng''' - This partition is for exclusive priority access to Dr. Huang's purchased GPU nodes. Job allocations are guaranteed.
= QoS =
CBCB users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard job QoSes]] in the '''cbcb''' and '''cbcb-heng''' partitions using the <code>cbcb</code> account.
The additional job QoSes for the '''cbcb''' and '''cbcb-heng''' partitions specifically are:
* <code>highmem</code>: Allows for significantly increased memory to be allocated.
* <code>huge-long</code>: Allows for longer jobs using higher overall resources.
Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
The ''only'' allowed job QoS for the '''cbcb-interactive''' partition is:
* <code>interactive</code>: Allows for 4 CPU / 128G mem jobs up to 12 hours in length - can only be used via <code>srun</code> or <code>salloc</code>.


= Jobs =
= Jobs =
You will need to specify a <code>--partition</code>, <code>--account</code> and <code>--qos</code> when you submit jobs into the CBCB partition..
You will need to specify <code>--partition=cbcb</code> and <code>--account=cbcb</code> to be able to submit jobs to the CBCB partition.  


<pre>
<pre>
[derek@nexuscbcb00:~ ] $ srun --pty --mem=2000G --qos=highmem --partition=cbcb --account=cbcb --time 1-00:00:00 bash
[username@nexuscbcb00:~ ] $ srun --pty --ntasks=16 --mem=2000G --qos=highmem --partition=cbcb --account=cbcb --time 1-00:00:00 bash
srun: job 218872 queued and waiting for resources
srun: job 218874 queued and waiting for resources
srun: job 218872 has been allocated resources
srun: job 218874 has been allocated resources
[derek@cbcb00:~ ] $ scontrol show job 218872
[username@cbcb00:~ ] $ scontrol show job 218874
JobId=218872 JobName=bash
JobId=218874 JobName=bash
   UserId=derek(2174) GroupId=derek(22174) MCS_label=N/A
   UserId=username(1000) GroupId=username(21000) MCS_label=N/A
   Priority=897 Nice=0 Account=cbcb QOS=highmem
   Priority=897 Nice=0 Account=cbcb QOS=highmem
   JobState=RUNNING Reason=None Dependency=(null)
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:07 TimeLimit=1-00:00:00 TimeMin=N/A
   RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2022-11-18T11:09:28 EligibleTime=2022-11-18T11:09:28
   SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56
   AccrueTime=2022-11-18T11:09:28
   AccrueTime=2022-11-18T11:13:56
   StartTime=2022-11-18T11:09:28 EndTime=2022-11-19T11:09:28 Deadline=N/A
   StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A
   PreemptEligibleTime=2022-11-18T11:09:28 PreemptTime=None
   PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:09:28 Scheduler=Main
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
   Partition=cbcb AllocNode:Sid=nexuscbcb00:25443
   Partition=cbcb AllocNode:Sid=nexuscbcb00:25443
   ReqNodeList=(null) ExcNodeList=(null)
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=cbcb00
   NodeList=cbcb00
   BatchHost=cbcb00
   BatchHost=cbcb00
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   NumNodes=1 NumCPUs=16 NumTasks=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,mem=2000G,node=1,billing=2251
   TRES=cpu=16,mem=2000G,node=1,billing=2266
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=2000G MinTmpDiskNode=0
   MinCPUsNode=1 MinMemoryNode=2000G MinTmpDiskNode=0
Line 62: Line 171:
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   Command=bash
   WorkDir=/nfshomes/derek
   WorkDir=/nfshomes/username
   Power=
   Power=
</pre>
</pre>
= Storage =
CBCB still has its current [https://wiki.umiacs.umd.edu/cbcb-private/index.php/Storage storage] allocation in place.  All data filesystems that were available in the standalone CBCB cluster are also available in Nexus.  Please note about the change in your home directory in the migration section below.
CBCB users can also request [[Nexus#Project_Allocations | Nexus project allocations]].


= Migration =
= Migration =
== Home Directories ==
The [[Nexus]] runs on our [[NFShomes]] home directories and not /cbcbhomes/$USERNAME.  As part of the process of migrating into Nexus you may need or want to copy any shell customization from your existing <code>/cbcbhomes</code> to your new home directory.  To make this transition easier <code>/cbcbhomes</code> is available to the CBCB submission nodes.


== Operating System / Software ==
== Operating System / Software ==
Previously CBCB's cluster was running RHEL7.  The [[Nexus]] is running exclusively RHEL8 so any software you may have compiled may need to be re-compiled to work correctly in this new environment.  The CBCB [https://wiki.umiacs.umd.edu/cbcb/index.php/CBCB_Software_Modules module tree] for RHEL8 has just been started (and may not be populated) and if you do not see the modules you need you should reach out to the maintainers.
CBCB's standalone cluster submission and compute nodes were running RHEL7.  [[Nexus]] is exclusively running RHEL8, so any software you may have compiled may need to be re-compiled to work correctly in this new environment.  The [https://wiki.umiacs.umd.edu/cbcb/index.php/CBCB_Software_Modules CBCB module tree] for RHEL8 may not yet be fully populated with RHEL8 software.  If you do not see the modules you need, please reach out to the [https://wiki.umiacs.umd.edu/cbcb/index.php/CBCB_Software_Modules#Contact CBCB software maintainers].

Latest revision as of 17:42, 22 November 2024

The compute nodes from CBCB's previous standalone cluster have folded into Nexus as of mid 2023.

The Nexus cluster already has a large pool of compute resources made possible through college-level funding for UMIACS and CSD faculty. Details on common nodes already in the cluster (Tron partition) can be found here.

Please contact staff with any questions or concerns.

Submission Nodes

You can SSH to nexuscbcb.umiacs.umd.edu to log in to a submission node.

If you store something in a local filesystem directory (/tmp, /scratch0) on one of the two submission nodes, you will need to connect to that same submission node to access it later. The actual submission nodes are:

  • nexuscbcb00.umiacs.umd.edu
  • nexuscbcb01.umiacs.umd.edu

Nodes

All nodes in CBCB-owned partitions (see below section) owned by CBCB faculty are named in the format cbcb##. The sets of nodes are:

  • 22 nodes that were purchased in October 2022 with center-wide funding. They are cbcb[00-21].
  • 4 nodes from the previous standalone CBCB cluster that moved in as of Summer 2023. They are cbcb[22-25].
  • A few additional nodes purchased by Dr. Heng Huang since then. They are all remaining 'cbcb' named nodes.
Nodenames Quantity CPU cores per node (CPUs) Memory per node (type) Filesystem storage per node (type/location) GPUs per node (type)
cbcb[00-21] 22 32 (Dual AMD EPYC 7313) ~2TB (DDR4 3200MHz) ~350GB (SATA SSD /scratch0), ~2TB (NVMe SSD /scratch1) 0
cbcb22 1 28 (Dual Intel Xeon E5-2680 v4) ~768GB (DDR4 2400MHz) ~650GB (SATA SSD /scratch0) 0
cbcb[23-24] 2 24 (Dual Intel Xeon E5-2650 v4) ~256GB (DDR4 2400MHz) ~800GB (SATA SSD /scratch0) 0
cbcb25 1 24 (Dual Intel Xeon E5-2650 v4) ~256GB (DDR4 2400MHz) ~1.4TB (SATA SSD /scratch0) 2 (1x NVIDIA GeForce GTX 1080 Ti, 1x NVIDIA GeForce RTX 2080 Ti)
cbcb26 1 128 (Dual AMD EPYC 7763) ~512GB (DDR4 3200MHz) ~3.4TB (NVMe SSD /scratch0), ~14TB (NVMe SSD /scratch1) 7 (NVIDIA RTX A5000)
cbcb27 1 64 (Dual AMD EPYC 7513) ~256GB (DDR4 3200MHz) ~3.4TB (SATA SSD /scratch0), ~3.5TB (NVMe SSD /scratch1) 8 (NVIDIA RTX A6000)
cbcb[28-29] 2 32 (Dual AMD EPYC 9124) ~768GB (DDR5 4800MHz) ~350GB (SATA SSD /scratch0), ~7TB (NVMe SSD /scratch1) 8 (NVIDIA RTX 6000 Ada Generation)
Total 30 1060 (various) ~49TB (various) ~94TB (various) 33 (various)

Here is the listing of nodes as shown by the Slurm alias show_nodes (again, all nodes are named in the format cbcb##):

[root@nexusctl00 ~]# show_nodes | grep cbcb
NODELIST             CPUS       MEMORY     AVAIL_FEATURES                 GRES                             STATE
cbcb00               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb01               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb02               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb03               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb04               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb05               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb06               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb07               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb08               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb09               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb10               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb11               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb12               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb13               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb14               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb15               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb16               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb17               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb18               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb19               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb20               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb21               32         2061175    rhel8,Zen,EPYC-7313            (null)                           idle
cbcb22               28         771245     rhel8,Xeon,E5-2680             (null)                           idle
cbcb23               24         255150     rhel8,Xeon,E5-2650             (null)                           idle
cbcb24               24         255150     rhel8,Xeon,E5-2650             (null)                           idle
cbcb25               24         255278     rhel8,Xeon,E5-2650,Pascal,Turi gpu:rtx2080ti:1,gpu:gtx1080ti:1  idle
cbcb26               128        513243     rhel8,Zen,EPYC-7763,Ampere     gpu:rtxa5000:7                   idle
cbcb27               64         255167     rhel8,Zen,EPYC-7513,Ampere     gpu:rtxa6000:8                   idle
cbcb28               32         771166     rhel8,Zen,EPYC-9124,Ada        gpu:rtx6000ada:8                 idle
cbcb29               32         771166     rhel8,Zen,EPYC-9124,Ada        gpu:rtx6000ada:8                 idle

Partitions

There are two partitions available to general CBCB SLURM users. You must specify one of these two partitions when submitting your job.

  • cbcb - This is the default partition. Job allocations on all nodes except those also in the cbcb-heng partition are guaranteed.
  • cbcb-interactive - This is a partition that only allows interactive jobs; you cannot submit jobs via sbatch to this partition. Job allocations are guaranteed.

There is one additional partition available solely to Dr. Heng Huang's sponsored accounts.

  • cbcb-heng - This partition is for exclusive priority access to Dr. Huang's purchased GPU nodes. Job allocations are guaranteed.

QoS

CBCB users have access to all of the standard job QoSes in the cbcb and cbcb-heng partitions using the cbcb account.

The additional job QoSes for the cbcb and cbcb-heng partitions specifically are:

  • highmem: Allows for significantly increased memory to be allocated.
  • huge-long: Allows for longer jobs using higher overall resources.

Please note that the partition has a GrpTRES limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.

The only allowed job QoS for the cbcb-interactive partition is:

  • interactive: Allows for 4 CPU / 128G mem jobs up to 12 hours in length - can only be used via srun or salloc.

Jobs

You will need to specify --partition=cbcb and --account=cbcb to be able to submit jobs to the CBCB partition.

[username@nexuscbcb00:~ ] $ srun --pty --ntasks=16 --mem=2000G --qos=highmem --partition=cbcb --account=cbcb --time 1-00:00:00 bash
srun: job 218874 queued and waiting for resources
srun: job 218874 has been allocated resources
[username@cbcb00:~ ] $ scontrol show job 218874
JobId=218874 JobName=bash
   UserId=username(1000) GroupId=username(21000) MCS_label=N/A
   Priority=897 Nice=0 Account=cbcb QOS=highmem
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56
   AccrueTime=2022-11-18T11:13:56
   StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A
   PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
   Partition=cbcb AllocNode:Sid=nexuscbcb00:25443
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=cbcb00
   BatchHost=cbcb00
   NumNodes=1 NumCPUs=16 NumTasks=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=16,mem=2000G,node=1,billing=2266
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=2000G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/nfshomes/username
   Power=

Storage

CBCB still has its current storage allocation in place. All data filesystems that were available in the standalone CBCB cluster are also available in Nexus. Please note about the change in your home directory in the migration section below.

CBCB users can also request Nexus project allocations.

Migration

Operating System / Software

CBCB's standalone cluster submission and compute nodes were running RHEL7. Nexus is exclusively running RHEL8, so any software you may have compiled may need to be re-compiled to work correctly in this new environment. The CBCB module tree for RHEL8 may not yet be fully populated with RHEL8 software. If you do not see the modules you need, please reach out to the CBCB software maintainers.