Revision as of 14:04, 9 August 2022

The Nexus is the combined scheduler of resources in UMIACS. Many of our existing computational clusters that have discrete schedulers will be folding into this scheduler in the future (see below). The resource manager for Nexus (as with our other existing computational clusters) is SLURM. Resources are arranged into partitions where users are able to schedule computational jobs. Users are arranged into a number of SLURM accounts based on faculty, lab, or center investments.

Getting Started

All accounts in UMIACS are sponsored. If you don't already have a UMIACS account, please see Nexus/Accounts for information on getting one.

Access

The submission nodes for the Nexus computational resources are determined by department, center, or lab affiliation. You can log into the UMIACS Directory CR application and select the Computational Resource (CR) in the list that has the prefix nexus. The Hosts section lists your available login nodes.

Note - UMIACS requires multi-factor authentication through our Duo instance. This is completely discrete from both UMD's and CSD's Duo instances. You will need to enroll one or more devices to access resources in UMIACS, and will be prompted to enroll when you log into the Directory application for the first time.

Once you have identified your submission nodes, you can SSH directly into them. From there, you are able to submit to the cluster via our SLURM workload manager. You need to make sure that your submitted jobs have the correct account, partition, and qos.

Jobs

SLURM jobs are submitted by either srun or sbatch depending if you are doing an interactive job or batch job, respectively. You need to provide the where/how/who to run the job and specify the resources you need to run with.

For the where/how/who, you may be required to specify --partition, --qos, and/or --account (respectively) to be able to adequately submit jobs to the Nexus.

For resources, you may need to specify --time for time, --tasks for CPUs, --mem for RAM, and --gres=gpu for GPUs in your submission arguments to meet your requirements. There are defaults for all four, so if you don't specify something, you may be scheduled with a very minimal set of time and resources (e.g., by default, NO GPUs are included if you do not specify --gres=gpu). For more information about submission flags for GPU resources, see SLURM/JobSubmission#Requesting_GPUs. You can also can run man srun on your submission node for a complete list of available submission arguments.

Interactive

Once logged into a submission node, you can run simple interactive jobs. If your session is interrupted from the submission node, the job will be killed. As such, we encourage use of a terminal multiplexer such as Tmux.

$ srun --pty --ntasks 4 --mem=2gb --gres=gpu:1 nvidia-smi -L
GPU 0: NVIDIA RTX A4000 (UUID: GPU-ae5dc1f5-c266-5b9f-58d5-7976e62b3ca1)

Batch

Batch jobs are scheduled with a script file with an optional ability to embed job scheduling parameters via variables that are defined by #SBATCH lines at the top of the file. You can find some examples in our SLURM/JobSubmission documentation.

Partitions

The SLURM resource manager uses partitions to act as job queues which can restrict size, time and user limits. The Nexus (when fully operational) will have a number of different partitions of resources. Different Centers, Labs, and Faculty will be able to invest in computational resources that will be restricted to approved users through these partitions.

Nexus/Tron - This is the pool of resources available to all UMIACS and CSD faculty and graduate students. It provides access for undergraduate and graduate teaching resources.
Nexus/CLIP - CLIP lab pool available for CLIP lab members.
Nexus/Gamma - GAMMA lab pool available for GAMMA lab members.
Scavenger - This is a preemption partition that supports nodes from multiple other partitions. More resources are available to schedule simultaneously than in other partitions, however jobs are subject to preemption rules. You are responsible for ensuring your jobs handle this preemption correctly. The SLURM scheduler will simply restart a preempted job with the same submission arguments when it is available to run again.

Quality of Service (QoS)

SLURM uses a QoS to provide limits on job sizes to users. Note that you should still try to only allocate the minimum resources for your jobs, as resources that each of your jobs schedules are counted against your FairShare priority in the future.

default - Default QoS. Limited to 4 cores, 32GB RAM, and 1 GPU per job. The maximum wall time per job is 3 days. 4 jobs are permitted simultaneously.
medium - Limited to 8 cores, 64GB RAM, and 2 GPUs per job . The maximum wall time per job is 2 days. 2 jobs are permitted simultaneously.
high - Limited to 16 cores, 128GB RAM, and 4 GPUs per job. The maximum wall time per job is 1 day. Only 1 job is permitted simultaneously.
scavenger - Limited to 64 cores, 256GB RAM, and 8 GPUs per job. The maximum wall time per job is 2 days. Only 16 GPUs are permitted simultaneously. This QoS is both only available in the scavenger partition and the only QoS available in the scavenger partition. To use this QoS, include --partition=scavenger and --account=scavenger in your submission arguments. Do not include any QoS argument other than --qos=scavenger (optional) or the submission will fail.

You can display these QoSes from the command line using show_qos command. Other lab-or-group-specific QoSes or reserved QoSes may also appear in the listing. The above four QoSes are the ones that everyone can submit to.

# show_qos
            Name     MaxWall MaxJobs                        MaxTRES     MaxTRESPU   Priority
---------------- ----------- ------- ------------------------------ ------------- ----------
          normal                                                                           0
       scavenger  2-00:00:00             cpu=64,gres/gpu=8,mem=256G   gres/gpu=16          0
          medium  2-00:00:00       2       cpu=8,gres/gpu=2,mem=64G                        0
            high  1-00:00:00       1     cpu=16,gres/gpu=4,mem=128G                        0
         default  3-00:00:00       4       cpu=4,gres/gpu=1,mem=32G                        0
            tron                                                       gres/gpu=4          0
       huge-long 10-00:00:00             cpu=32,gres/gpu=8,mem=256G                        0

Please note that in the default non-preemption partition (tron), you will be restricted to 4 total GPUs at once across all jobs you have running in the QoSes allowed by that partition. This is codified by the reserved QoS also named tron in the output above.

To find out what accounts and partitions you have access to, use the show_assoc command.

Storage

All storage available in Nexus is currently NFS based. We will be introducing some changes for Phase 2 to support high performance GPUDirect Storage (GDS). These storage allocation procedures will be revised and approved by the launch of Phase 2 by a joint UMIACS and CSD faculty committee.

Home Directories

Home directories in the Nexus computational infrastructure are available from the Institute's NFShomes as /nfshomes/USERNAME where USERNAME is your username. These home directories have very limited storage (20GB, cannot be increased) and are intended for your personal files, configuration and source code. Your home directory is not intended for data sets or other large scale data holdings. Users are encouraged to utilize our GitLab infrastructure to host your code repositories.

NOTE: To check your quota on this directory you will need to use the quota -s command.

Your home directory data is fully protected and has both snapshots and is backed up nightly.

In Phase 2, other standalone compute clusters will begin to fold into partitions in Nexus. Lab home directories will be gradually phased out in favor of the /nfshomes home directories.

Scratch Directories

Scratch data has no data protection including no snapshots and the data is not backed up. There are two types of scratch directories in the Nexus compute infrastructure:

Network scratch directory
Local scratch directories

Network Scratch Directory

You are allocated 200GB of scratch space via NFS from /fs/nexus-scratch/$username. It is not backed up or protected in any way. This directory is automounted so you will need to cd into the directory or request/specify a fully qualified file path to access this.

You may request a permanent increase of up to 400GB total space without any faculty approval by contacting staff. If you need space beyond 400GB, you will need faculty approval and/or a project directory.

This file system is available on all submission, data management, and computational nodes within the cluster.

Local Scratch Directories

Each computational node that a user can schedule compute jobs on also has one or more local scratch directories. These are always named /scratch0, /scratch1, etc. These are almost always more performant than any other storage available to the job. However, users must stage their data within the confine of their job and stage the data out before the end of their job.

These local scratch directories have a tmpwatch job which will delete unaccessed data after 90 days, scheduled via maintenance jobs to run once a month at 1am. Different nodes will run the maintenance jobs on different days of the month to ensure the cluster is still highly available at all times. Please make sure you secure any data you write to these directories at the end of your job.

Faculty Allocations

Each faculty member can be allocated 1TB of lab space upon request. We can also support grouping these individual allocations together into larger center, lab, or research group allocations if desired by the faculty. Please contact staff to inquire.

This lab space does not have snapshots by default (but are available if requested), but is backed up.

Project Allocations

Project allocations are available per user for 270 TB days; you can have a 1TB allocation for up to 270 days, a 3TB allocation for 90 days, etc.. A single faculty member can not have more than 20 TB of sponsored account project allocations active at any point.

The minimum storage space you can request (maximum length) is 500GB (540 days) and the minimum allocation length you can request (maximum storage) is 30 days (9TB).

To request an allocation, please contact staff with your account sponsor involved in the conversation. Please include the following details:

Project Name (short)
Description
Size (1TB, 2TB, etc.)
Length in days (270 days, 135 days, etc.)

These allocations will be available via /fs/nexus-projects/$project_name. Near the end of the allocation period, staff will contact you and ask you if you want to renew for the same duration; you cannot exceed the original 270 TB days limit for renewals.

Datasets

We have read-only dataset storage available at /fs/nexus-datasets. If there are datasets that you would like to see curated and available, please see this page.

We will have a more formal process to approve datasets by phase 2 of Nexus.

Migrations

If you are a user of an existing cluster that is the process of being folded into Nexus now or in the near future, your cluster-specific migration information will be listed here.

CLIP

@@ Line 1: / Line 1: @@
-The Nexus is the combined scheduler of resources in UMIACS.  Many of our existing computational clusters that are discrete will be folding into this scheduler.  The resource manager for this is [[SLURM]] and resources will be arranged into partitions of resources where users will be able to schedule computational jobs.  Users will be arranged into a number of Slurm accounts based on faculty, lab or center investments.
+The Nexus is the combined scheduler of resources in UMIACS.  Many of our existing computational clusters that have discrete schedulers will be folding into this scheduler in the future (see [[#Migrations | below]]).  The resource manager for Nexus (as with our other existing computational clusters) is [[SLURM]].  Resources are arranged into partitions where users are able to schedule computational jobs.  Users are arranged into a number of SLURM accounts based on faculty, lab, or center investments.
-= Accounts =
+= Getting Started =
+All accounts in UMIACS are sponsored.  If you don't already have a UMIACS account, please see [[Nexus/Accounts]] for information on getting one.
-All accounts in UMIACS are required to be sponsored and can be requested in our Requests [https://intranet.umiacs.umd.edu/requests/accounts/new/ application].  Each faculty will be required to have an account to sponsor their students and collaborators.  When a user fills out the form they will list their Principal Investigator/Sponsor. Faculty accounts should list the UMIACS Director of Computing (currently <code>derek</code>) while all other users should list their UMIACS faculty member for this sponsorship.
+== Access ==
+The submission nodes for the Nexus computational resources are determined by department, center, or lab affiliation.  You can log into the [https://intranet.umiacs.umd.edu/directory/cr/ UMIACS Directory CR application] and select the Computational Resource (CR) in the list that has the prefix <code>nexus</code>. The Hosts section lists your available login nodes.
-The submission nodes for the Nexus computational resources are determined by department, center or lab affiliation.  Users can log into the UMIACS Directory application and select their [https://intranet.umiacs.umd.edu/directory/cr/ Computational Resources] (CR).  They will find a CR that has the prefix <code>nexus</code> and select it to list their available login nodes.
+'''Note''' - UMIACS requires multi-factor authentication through our [[Duo]] instance.  This is completely discrete from both UMD's and CSD's Duo instances.  You will need to enroll one or more devices to access resources in UMIACS, and will be prompted to enroll when you log into the Directory application for the first time.
-'''Note''' - UMIACS requires multi-factor authentication through our [[Duo]] instance.  This is completely discrete from both UMD and/or CSD Duo instances and users will need to enroll device(s) to access resources in UMIACS.  Users will be prompted when they log into the Directory application the first time.
+Once you have identified your submission nodes, you can [[SSH]] directly into them.  From there, you are able to submit to the cluster via our [[SLURM]] workload manager.  You need to make sure that your submitted jobs have the correct account, partition, and qos.
-= Access =
+== Jobs ==
+[[SLURM]] jobs are submitted by either <code>srun</code> or <code>sbatch</code> depending if you are doing an interactive job or batch job, respectively.  You need to provide the where/how/who to run the job and specify the resources you need to run with.
-Once users have identified their submission nodes they will be able to [[SSH]] into them.
+For the where/how/who, you may be required to specify <code>--partition</code>, <code>--qos</code>, and/or <code>--account</code> (respectively) to be able to adequately submit jobs to the Nexus.
+For resources, you may need to specify <code>--time</code> for time, <code>--tasks</code> for CPUs, <code>--mem</code> for RAM, and <code>--gres=gpu</code> for GPUs in your submission arguments to meet your requirements.  There are defaults for all four, so if you don't specify something, you may be scheduled with a very minimal set of time and resources (e.g., by default, NO GPUs are included if you do not specify <code>--gres=gpu</code>).  For more information about submission flags for GPU resources, see [[SLURM/JobSubmission#Requesting_GPUs]].  You can also can run <code>man srun</code> on your submission node for a complete list of available submission arguments.
+=== Interactive ===
+Once logged into a submission node, you can run simple interactive jobs.  If your session is interrupted from the submission node, the job will be killed.  As such, we encourage use of a terminal multiplexer such as [[Tmux]].
+<pre>
+$ srun --pty --ntasks 4 --mem=2gb --gres=gpu:1 nvidia-smi -L
+GPU 0: NVIDIA RTX A4000 (UUID: GPU-ae5dc1f5-c266-5b9f-58d5-7976e62b3ca1)
+</pre>
+=== Batch ===
+Batch jobs are scheduled with a script file with an optional ability to embed job scheduling parameters via variables that are defined by <code>#SBATCH</code> lines at the top of the file.  You can find some examples in our [[SLURM/JobSubmission]] documentation.
 = Partitions =
+The SLURM resource manager uses partitions to act as job queues which can restrict size, time and user limits.  The Nexus (when fully operational) will have a number of different partitions of resources.  Different Centers, Labs, and Faculty will be able to invest in computational resources that will be restricted to approved users through these partitions.
+* [[Nexus/Tron]] - This is the pool of resources available to all UMIACS and CSD faculty and graduate students.  It provides access for undergraduate and graduate teaching resources.
+* [[Nexus/CLIP]] - CLIP lab pool available for CLIP lab members.
+* [[Nexus/Gamma]] - GAMMA lab pool available for GAMMA lab members.
+* Scavenger - This is a [https://slurm.schedmd.com/preempt.html preemption] partition that supports nodes from multiple other partitions.  More resources are available to schedule simultaneously than in other partitions, however jobs are subject to preemption rules.  You are responsible for ensuring your jobs handle this preemption correctly.  The SLURM scheduler will simply restart a preempted job with the same submission arguments when it is available to run again.
+= Quality of Service (QoS) =
+SLURM uses a QoS to provide limits on job sizes to users.  Note that you should still try to only allocate the minimum resources for your jobs, as resources that each of your jobs schedules are counted against your [https://slurm.schedmd.com/fair_tree.html FairShare priority] in the future.
+* default - Default QoS. Limited to 4 cores, 32GB RAM, and 1 GPU per job.  The maximum wall time per job is 3 days.  4 jobs are permitted simultaneously.
+* medium - Limited to 8 cores, 64GB RAM, and 2 GPUs per job .  The maximum wall time per job is 2 days.  2 jobs are permitted simultaneously.
+* high - Limited to 16 cores, 128GB RAM, and 4 GPUs per job.  The maximum wall time per job is 1 day.  Only 1 job is permitted simultaneously.
+* scavenger - Limited to 64 cores, 256GB RAM, and 8 GPUs per job.  The maximum wall time per job is 2 days.  Only 16 GPUs are permitted simultaneously.  This QoS is both only available in the scavenger partition and the only QoS available in the scavenger partition. To use this QoS, include <code>--partition=scavenger</code> and <code>--account=scavenger</code> in your submission arguments. Do not include any QoS argument other than <code>--qos=scavenger</code> (optional) or the submission will fail.
+You can display these QoSes from the command line using <code>show_qos</code> command. Other lab-or-group-specific QoSes or reserved QoSes may also appear in the listing. The above four QoSes are the ones that everyone can submit to.
+<pre>
+# show_qos
+            Name     MaxWall MaxJobs                        MaxTRES     MaxTRESPU   Priority
+---------------- ----------- ------- ------------------------------ ------------- ----------
+          normal                                                                           0
+       scavenger  2-00:00:00             cpu=64,gres/gpu=8,mem=256G   gres/gpu=16          0
+          medium  2-00:00:00       2       cpu=8,gres/gpu=2,mem=64G                        0
+            high  1-00:00:00       1     cpu=16,gres/gpu=4,mem=128G                        0
+         default  3-00:00:00       4       cpu=4,gres/gpu=1,mem=32G                        0
+            tron                                                       gres/gpu=4          0
+       huge-long 10-00:00:00             cpu=32,gres/gpu=8,mem=256G                        0
+</pre>
+Please note that in the default non-preemption partition (<code>tron</code>), you will be restricted to 4 total GPUs at once across all jobs you have running in the QoSes allowed by that partition.  This is codified by the reserved QoS also named <code>tron</code> in the output above.
+To find out what accounts and partitions you have access to, use the <code>show_assoc</code> command.
+= Storage =
+All storage available in Nexus is currently [[NFS]] based.  We will be introducing some changes for Phase 2 to support high performance GPUDirect Storage (GDS).  These storage allocation procedures will be revised and approved by the launch of Phase 2 by a joint UMIACS and CSD faculty committee.
+== Home Directories ==
+Home directories in the Nexus computational infrastructure are available from the Institute's [[NFShomes]] as <code>/nfshomes/USERNAME</code> where USERNAME is your username.  These home directories have very limited storage (20GB, cannot be increased) and are intended for your personal files, configuration and source code.  Your home directory is '''not''' intended for data sets or other large scale data holdings.  Users are encouraged to utilize our [[GitLab]] infrastructure to host your code repositories.
+'''NOTE''': To check your quota on this directory you will need to use the <code>quota -s</code> command.
+Your home directory data is fully protected and has both [[Snapshots | snapshots]] and is [[NightlyBackups | backed up nightly]].
+In Phase 2, other standalone compute clusters will begin to fold into partitions in Nexus.  Lab home directories will be gradually phased out in favor of the <code>/nfshomes</code> home directories.
+== Scratch Directories ==
+Scratch data has no data protection including no snapshots and the data is not backed up. There are two types of scratch directories in the Nexus compute infrastructure:
+* Network scratch directory
+* Local scratch directories
+=== Network Scratch Directory ===
+You are allocated 200GB of scratch space via NFS from <code>/fs/nexus-scratch/$username</code>.  '''It is not backed up or protected in any way.'''  This directory is '''automounted''' so you will need to <code>cd</code> into the directory or request/specify a fully qualified file path to access this.
+You may request a permanent increase of up to 400GB total space without any faculty approval by [[HelpDesk | contacting staff]].  If you need space beyond 400GB, you will need faculty approval and/or a project directory.
+This file system is available on all submission, data management, and computational nodes within the cluster.
+=== Local Scratch Directories ===
+Each computational node that a user can schedule compute jobs on also has one or more local scratch directories.  These are always named <code>/scratch0</code>, <code>/scratch1</code>, etc.  These are almost always more performant than any other storage available to the job.  However, users must stage their data within the confine of their job and stage the data out before the end of their job.
+These local scratch directories have a tmpwatch job which will '''delete unaccessed data after 90 days''', scheduled via maintenance jobs to run once a month at 1am.  Different nodes will run the maintenance jobs on different days of the month to ensure the cluster is still highly available at all times.  Please make sure you secure any data you write to these directories at the end of your job.
+== Faculty Allocations ==
+Each faculty member can be allocated 1TB of lab space upon request.  We can also support grouping these individual allocations together into larger center, lab, or research group allocations if desired by the faculty.  Please [[HelpDesk | contact staff]] to inquire.
+This lab space does not have [[Snapshots | snapshots]] by default (but are available if requested), but is [[NightlyBackups | backed up]].
+== Project Allocations ==
+Project allocations are available per user for 270 TB days; you can have a 1TB allocation for up to 270 days, a 3TB allocation for 90 days, etc..  A single faculty member can not have more than 20 TB of sponsored account project allocations active at any point.
+The minimum storage space you can request (maximum length) is 500GB (540 days) and the minimum allocation length you can request (maximum storage) is 30 days (9TB).
+To request an allocation, please [[HelpDesk | contact staff]] with your account sponsor involved in the conversation.  Please include the following details:
+* Project Name (short)
+* Description
+* Size (1TB, 2TB, etc.)
+* Length in days (270 days, 135 days, etc.)
+These allocations will be available via <code>/fs/nexus-projects/$project_name</code>. Near the end of the allocation period, staff will contact you and ask you if you want to renew for the same duration; you cannot exceed the original 270 TB days limit for renewals.
+== Datasets ==
+We have read-only dataset storage available at <code>/fs/nexus-datasets</code>.  If there are datasets that you would like to see curated and available, please see [[Datasets | this page]].
+We will have a more formal process to approve datasets by phase 2 of Nexus.
+= Migrations =
+If you are a user of an existing cluster that is the process of being folded into Nexus now or in the near future, your cluster-specific migration information will be listed here.
-* [[Nexus/Tron]] - This currently the pool of resources available to all UMIACS and CSD faculty and graduate students.  It will provide access for undergraduate and graduate teaching resources.
+* [[Nexus/CLIP | CLIP]]

Nexus: Difference between revisions

Revision as of 14:04, 9 August 2022

Contents

Getting Started

Access

Jobs

Interactive

Batch

Partitions

Quality of Service (QoS)

Storage

Home Directories

Scratch Directories

Network Scratch Directory

Local Scratch Directories

Faculty Allocations

Project Allocations

Datasets

Migrations

Navigation menu

Nexus: Difference between revisions

Revision as of 14:04, 9 August 2022

Getting Started

Access

Jobs

Interactive

Batch

Partitions

Quality of Service (QoS)

Storage

Home Directories

Scratch Directories

Network Scratch Directory

Local Scratch Directories

Faculty Allocations

Project Allocations

Datasets

Migrations

Navigation menu

Search