Nexus/CBCB: Difference between revisions
(→Jobs) |
(→Jobs) |
||
Line 36: | Line 36: | ||
<pre> | <pre> | ||
[derek@nexuscbcb00:~ ] $ srun --pty --mem=2000G --qos=highmem --partition=cbcb --account=cbcb --time 1-00:00:00 bash | [derek@nexuscbcb00:~ ] $ srun --pty --ntasks=16 --mem=2000G --qos=highmem --partition=cbcb --account=cbcb --time 1-00:00:00 bash | ||
srun: job | srun: job 218874 queued and waiting for resources | ||
srun: job | srun: job 218874 has been allocated resources | ||
[derek@cbcb00:~ ] $ scontrol show job | [derek@cbcb00:~ ] $ scontrol show job 218874 | ||
JobId= | JobId=218874 JobName=bash | ||
UserId=derek(2174) GroupId=derek(22174) MCS_label=N/A | UserId=derek(2174) GroupId=derek(22174) MCS_label=N/A | ||
Priority=897 Nice=0 Account=cbcb QOS=highmem | Priority=897 Nice=0 Account=cbcb QOS=highmem | ||
JobState=RUNNING Reason=None Dependency=(null) | JobState=RUNNING Reason=None Dependency=(null) | ||
Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 | Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 | ||
RunTime=00:00: | RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A | ||
SubmitTime=2022-11-18T11: | SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56 | ||
AccrueTime=2022-11-18T11: | AccrueTime=2022-11-18T11:13:56 | ||
StartTime=2022-11-18T11: | StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A | ||
PreemptEligibleTime=2022-11-18T11: | PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None | ||
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11: | SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main | ||
Partition=cbcb AllocNode:Sid=nexuscbcb00:25443 | Partition=cbcb AllocNode:Sid=nexuscbcb00:25443 | ||
ReqNodeList=(null) ExcNodeList=(null) | ReqNodeList=(null) ExcNodeList=(null) | ||
NodeList=cbcb00 | NodeList=cbcb00 | ||
BatchHost=cbcb00 | BatchHost=cbcb00 | ||
NumNodes=1 NumCPUs= | NumNodes=1 NumCPUs=16 NumTasks=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:* | ||
TRES=cpu= | TRES=cpu=16,mem=2000G,node=1,billing=2266 | ||
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* | Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* | ||
MinCPUsNode=1 MinMemoryNode=2000G MinTmpDiskNode=0 | MinCPUsNode=1 MinMemoryNode=2000G MinTmpDiskNode=0 |
Revision as of 16:14, 18 November 2022
The Nexus computational resources and scheduler house the CBCB's new computational partition.
Submission Nodes
There are two submission nodes for Nexus exclusively available for CBCB users.
nexuscbcb00.umiacs.umd.edu
nexuscbcb01.umiacs.umd.edu
Resources
The new CBCB partition has 22 new nodes with 32 AMD EPYC-7313 cores and 2000GB of memory each. CBCB users also has access to submitting jobs and accessing resources like GPUs in other partitions in Nexus.
QoS
Currently CBCB users have access to all the default QoS in the cbcb partition using the cbcb account however there is one additional QoS called highmem
that allows significantly increased memory to be allocated.
$ show_qos Name MaxWall MaxJobs MaxTRES MaxTRESPU GrpTRES ------------ ----------- ------- ------------------------------ ------------------------------ -------------------- normal scavenger 2-00:00:00 cpu=64,gres/gpu=8,mem=256G cpu=192,gres/gpu=24,mem=768G medium 2-00:00:00 cpu=8,gres/gpu=2,mem=64G high 1-00:00:00 cpu=16,gres/gpu=4,mem=128G default 3-00:00:00 cpu=4,gres/gpu=1,mem=32G tron cpu=32,gres/gpu=4,mem=256G huge-long 10-00:00:00 cpu=32,gres/gpu=8,mem=256G clip cpu=339,mem=2926G class cpu=32,gres/gpu=4,mem=256G gamma cpu=179,mem=1511G mc2 cpu=307,mem=1896G cbcb cpu=913,mem=46931G highmem 21-00:00:00 cpu=32,mem=2000G
Jobs
You will need to specify a --partition=cbcb
, --account=cbcb
and a specific --qos
when you submit jobs into the CBCB partition.
[derek@nexuscbcb00:~ ] $ srun --pty --ntasks=16 --mem=2000G --qos=highmem --partition=cbcb --account=cbcb --time 1-00:00:00 bash srun: job 218874 queued and waiting for resources srun: job 218874 has been allocated resources [derek@cbcb00:~ ] $ scontrol show job 218874 JobId=218874 JobName=bash UserId=derek(2174) GroupId=derek(22174) MCS_label=N/A Priority=897 Nice=0 Account=cbcb QOS=highmem JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A SubmitTime=2022-11-18T11:13:56 EligibleTime=2022-11-18T11:13:56 AccrueTime=2022-11-18T11:13:56 StartTime=2022-11-18T11:13:56 EndTime=2022-11-19T11:13:56 Deadline=N/A PreemptEligibleTime=2022-11-18T11:13:56 PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-11-18T11:13:56 Scheduler=Main Partition=cbcb AllocNode:Sid=nexuscbcb00:25443 ReqNodeList=(null) ExcNodeList=(null) NodeList=cbcb00 BatchHost=cbcb00 NumNodes=1 NumCPUs=16 NumTasks=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=16,mem=2000G,node=1,billing=2266 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=2000G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=bash WorkDir=/nfshomes/derek Power=
Migration
Home Directories
The Nexus runs on our NFShomes home directories and not /cbcbhomes/$USERNAME. As part of the process of migrating into Nexus you may need or want to copy any shell customization from your existing /cbcbhomes
to your new home directory. To make this transition easier /cbcbhomes
is available to the CBCB submission nodes.
Operating System / Software
Previously CBCB's cluster was running RHEL7. The Nexus is running exclusively RHEL8 so any software you may have compiled may need to be re-compiled to work correctly in this new environment. The CBCB module tree for RHEL8 has just been started (and may not be populated) and if you do not see the modules you need you should reach out to the maintainers.