Nexus/CBCB: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
The [[Nexus]] computational resources and scheduler house the CBCB's new computational partition.
The [[Nexus]] scheduler houses CBCB's new computational partition.


= Submission Nodes =
= Submission Nodes =
Line 8: Line 8:


= Resources =  
= Resources =  
The new CBCB partition has 22 new nodes with 32 AMD EPYC-7313 cores and 2000GB of memory each.  CBCB users also has access to submitting jobs and accessing resources like GPUs in other partitions in [[Nexus]].
The new CBCB partition has 22 new nodes with 32 AMD EPYC-7313 cores and 2000GB of memory each.  CBCB users also have access to submitting jobs and accessing resources like GPUs in other partitions in [[Nexus]].


= QoS =  
= QoS =  
Currently CBCB users have access to all the default QoS in the cbcb partition using the cbcb account however there is one additional QoS called <code>highmem</code> that allows significantly increased memory to be allocated.
CBCB users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard QoS']] in the <code>cbcb</code> partition using the <code>cbcb</code> account. There is one additional QoS called <code>highmem</code> that allows significantly increased memory to be allocated.


<pre>
<pre>
Line 33: Line 33:


= Jobs =
= Jobs =
You will need to specify a <code>--partition=cbcb</code>, <code>--account=cbcb</code> and a specific <code>--qos</code> when you submit jobs into the CBCB partition.  
You will need to specify <code>--partition=cbcb</code>, <code>--account=cbcb</code>, and a specific <code>--qos</code> to be able to submit jobs to the CBCB partition.  


<pre>
<pre>
[derek@nexuscbcb00:~ ] $ srun --pty --ntasks=16 --mem=2000G --qos=highmem --partition=cbcb --account=cbcb --time 1-00:00:00 bash
[username@nexuscbcb00:~ ] $ srun --pty --ntasks=16 --mem=2000G --qos=highmem --partition=cbcb --account=cbcb --time 1-00:00:00 bash
srun: job 218874 queued and waiting for resources
srun: job 218874 queued and waiting for resources
srun: job 218874 has been allocated resources
srun: job 218874 has been allocated resources
[derek@cbcb00:~ ] $ scontrol show job 218874
[username@cbcb00:~ ] $ scontrol show job 218874
JobId=218874 JobName=bash
JobId=218874 JobName=bash
   UserId=derek(2174) GroupId=derek(22174) MCS_label=N/A
   UserId=username(1000) GroupId=username(21000) MCS_label=N/A
   Priority=897 Nice=0 Account=cbcb QOS=highmem
   Priority=897 Nice=0 Account=cbcb QOS=highmem
   JobState=RUNNING Reason=None Dependency=(null)
   JobState=RUNNING Reason=None Dependency=(null)
Line 62: Line 62:
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   Command=bash
   WorkDir=/nfshomes/derek
   WorkDir=/nfshomes/username
   Power=
   Power=
</pre>
</pre>


= Storage =
= Storage =
CBCB still has its current [https://wiki.umiacs.umd.edu/cbcb-private/index.php/Storage storage] allocation still in place.  All data file systems are available computationally in the Nexus that were available in the previous CBCB cluster.  Please note about the change in your home directory in the migration section below.
CBCB still has its current [https://wiki.umiacs.umd.edu/cbcb-private/index.php/Storage storage] allocation in place.  All data filesystems that are available in the standalone CBCB cluster are also available in Nexus.  Please note about the change in your home directory in the migration section below.


CBCB users can also request allocations from our [[Nexus#Storage]] policies.
CBCB users can also request [[Nexus#Project_Allocations | Nexus project allocations]].


= Migration =
= Migration =


== Home Directories ==
== Home Directories ==
The [[Nexus]] runs on our [[NFShomes]] home directories and not /cbcbhomes/$USERNAME.  As part of the process of migrating into Nexus you may need or want to copy any shell customization from your existing <code>/cbcbhomes</code> to your new home directory.  To make this transition easier <code>/cbcbhomes</code> is available to the CBCB submission nodes.
The [[Nexus]] runs on our [[NFShomes]] home directories and not <code>/cbcbhomes/$USERNAME</code>.  As part of the process of migrating into Nexus, you may want (or need) to copy any shell customization from your existing <code>/cbcbhomes</code> to your new home directory.  To make this transition easier, <code>/cbcbhomes</code> is available to the Nexus CBCB submission nodes.


== Operating System / Software ==
== Operating System / Software ==
Previously CBCB's cluster was running RHEL7.  The [[Nexus]] is running exclusively RHEL8 so any software you may have compiled may need to be re-compiled to work correctly in this new environment.  The CBCB [https://wiki.umiacs.umd.edu/cbcb/index.php/CBCB_Software_Modules module tree] for RHEL8 has just been started (and may not be populated) and if you do not see the modules you need you should reach out to the maintainers.
CBCB's standalone cluster submission and compute nodes are running RHEL7.  [[Nexus]] is exclusively running RHEL8, so any software you may have compiled may need to be re-compiled to work correctly in this new environment.  The [https://wiki.umiacs.umd.edu/cbcb/index.php/CBCB_Software_Modules CBCB module tree] for RHEL8 may not yet be fully populated with RHEL8 software.  If you do not see the modules you need, please reach out to the [https://wiki.umiacs.umd.edu/cbcb/index.php/CBCB_Software_Modules#Contact CBCB software maintainers].

Revision as of 14:29, 5 April 2023

The Nexus scheduler houses CBCB's new computational partition.

Submission Nodes

There are two submission nodes for Nexus exclusively available for CBCB users.

  • nexuscbcb00.umiacs.umd.edu
  • nexuscbcb01.umiacs.umd.edu

Resources

The new CBCB partition has 22 new nodes with 32 AMD EPYC-7313 cores and 2000GB of memory each. CBCB users also have access to submitting jobs and accessing resources like GPUs in other partitions in Nexus.

QoS

CBCB users have access to all of the standard QoS' in the cbcb partition using the cbcb account. There is one additional QoS called highmem that allows significantly increased memory to be allocated.

$ show_qos
        Name     MaxWall MaxJobs                        MaxTRES                      MaxTRESPU              GrpTRES
------------ ----------- ------- ------------------------------ ------------------------------ --------------------
      normal
   scavenger  2-00:00:00             cpu=64,gres/gpu=8,mem=256G   cpu=192,gres/gpu=24,mem=768G
      medium  2-00:00:00               cpu=8,gres/gpu=2,mem=64G
        high  1-00:00:00             cpu=16,gres/gpu=4,mem=128G
     default  3-00:00:00               cpu=4,gres/gpu=1,mem=32G
        tron                                                        cpu=32,gres/gpu=4,mem=256G
   huge-long 10-00:00:00             cpu=32,gres/gpu=8,mem=256G
        clip                                                                                      cpu=339,mem=2926G
       class                                                        cpu=32,gres/gpu=4,mem=256G
       gamma                                                                                      cpu=179,mem=1511G
         mc2                                                                                      cpu=307,mem=1896G
        cbcb                                                                                     cpu=913,mem=46931G
     highmem 21-00:00:00                       cpu=32,mem=2000G

Jobs

You will need to specify --partition=cbcb, --account=cbcb, and a specific --qos to be able to submit jobs to the CBCB partition.

[username@nexuscbcb00:~ ] $ srun --pty --ntasks=16 --mem=2000G --qos=highmem --partition=cbcb --account=cbcb --time 1-00:00:00 bash
srun: job 218874 queued and waiting for resources
srun: job 218874 has been allocated resources
[username@cbcb00:~ ] $ scontrol show job 218874
JobId=218874 JobName=bash
   UserId=username(1000) GroupId=username(21000) MCS_label=N/A
   Priority=897 Nice=0 Account=cbcb QOS=highmem
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56
   AccrueTime=2022-11-18T11:13:56
   StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A
   PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main
   Partition=cbcb AllocNode:Sid=nexuscbcb00:25443
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=cbcb00
   BatchHost=cbcb00
   NumNodes=1 NumCPUs=16 NumTasks=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=16,mem=2000G,node=1,billing=2266
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=2000G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/nfshomes/username
   Power=

Storage

CBCB still has its current storage allocation in place. All data filesystems that are available in the standalone CBCB cluster are also available in Nexus. Please note about the change in your home directory in the migration section below.

CBCB users can also request Nexus project allocations.

Migration

Home Directories

The Nexus runs on our NFShomes home directories and not /cbcbhomes/$USERNAME. As part of the process of migrating into Nexus, you may want (or need) to copy any shell customization from your existing /cbcbhomes to your new home directory. To make this transition easier, /cbcbhomes is available to the Nexus CBCB submission nodes.

Operating System / Software

CBCB's standalone cluster submission and compute nodes are running RHEL7. Nexus is exclusively running RHEL8, so any software you may have compiled may need to be re-compiled to work correctly in this new environment. The CBCB module tree for RHEL8 may not yet be fully populated with RHEL8 software. If you do not see the modules you need, please reach out to the CBCB software maintainers.