Nexus: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
The Nexus is the combined scheduler of resources in UMIACS.  Many of our existing computational clusters that are discrete will be folding into this scheduler.  The resource manager for this is [[SLURM]] and resources will be arranged into partitions of resources where users will be able to schedule computational jobs.  Users will be arranged into a number of Slurm accounts based on faculty, lab or center investments.
The Nexus is the combined scheduler of resources in UMIACS.  Many of our existing computational clusters that have discrete schedulers will be folding into this scheduler in the future.  The resource manager for Nexus (as with our other existing computational clusters) is [[SLURM]].  Resources are arranged into partitions where users are able to schedule computational jobs.  Users are arranged into a number of Slurm accounts based on faculty, lab, or center investments.


= Getting Started =
= Getting Started =
 
All accounts in UMIACS are sponsored.  If you don't already have a UMIACS account, please see [[Nexus/Accounts]] for information on getting one.
All accounts in UMIACS are sponsored.  If you don't already have a UMIACS account please see [[Nexus/Accounts]] for information on getting one.


== Access ==
== Access ==
The submission nodes for the Nexus computational resources are determined by department, center or lab affiliation.  You can log into the UMIACS Directory application and select their [https://intranet.umiacs.umd.edu/directory/cr/ Computational Resources] (CR).  They will find a CR that has the prefix <code>nexus</code>, select it and in the Host section will list the available login nodes.


The submission nodes for the Nexus computational resources are determined by department, center or lab affiliationUsers can log into the UMIACS Directory application and select their [https://intranet.umiacs.umd.edu/directory/cr/ Computational Resources] (CR)They will find a CR that has the prefix <code>nexus</code>, select it and in the Host section will list the available login nodes.
'''Note''' - UMIACS requires multi-factor authentication through our [[Duo]] instanceThis is completely discrete from both UMD's and CSD's Duo instancesYou will need to enroll one or more devices to access resources in UMIACS, and will be prompted to enroll when you log into the Directory application for the first time.


'''Note''' - UMIACS requires multi-factor authentication through our [[Duo]] instance.  This is completely discrete from both UMD and/or CSD Duo instances and users will need to enroll device(s) to access resources in UMIACS.  Users will be prompted when they log into the Directory application the first time.
Once you have identified your submission nodes, you are able to [[SSH]] directly into them.  From there, you are able to submit to the cluster via our [[SLURM]] workload manager.  You need to make sure that your submitted jobs have the correct account, partition, and qos.
 
Once users have identified their submission nodes they will be able to [[SSH]] directly into them.  From there users will be able to submit to the cluster via our [[SLURM]] workload manager.  Users will need to make sure that they submit jobs with correct account, partition and qos.


== Jobs ==
== Jobs ==
[[SLURM]] jobs are submitted by either <code>srun</code> or <code>sbatch</code> depending if you are doing an interactive job or batch job, respectively.  You need to provide the where/who the job will run and specify the resources you need to run with.  There are defaults for both, so if you don't specify something you may be scheduled with a very minimal set of time and resources (including NO GPUs unless specifically requested). 


[[SLURM]] jobs are submitted by either <code>srun</code> or <code>sbatch</code> depending if you are doing an interactive or batch job respectively.  You will need to provide the where/who the job will run and specify the resources you need to run with.  There are defaults for both so if you don't specify something you may be scheduled with a very minimal set of time and resources (including NO GPUs unless specifically requested).
For the where/who, you may be required to specify <code>--account</code>, <code>--qos</code> and/or <code>--partition</code> to be able to adequately submit jobs to the Nexus.


For the where/who you may be required to specify <code>--account</code>, <code>--qos</code> and/or <code>--partition</code> to be able to adequately submit jobs to the Nexus.
For resources, you may need to specify for cpus (<code>--tasks</code>), memory (<code>--mem</code>), and GPUs (<code>--gres=gpu</code>) in your submission arguments to meet your requirements.  For more information about submitting for GPU resources, see [[SLURM/JobSubmission#Requesting_GPUs]].  You can also can run <code>man srun</code> on your submission node for a complete list of available submission arguments.
 
For resources you may need to specify for cpus (<code>--tasks</code>), memory (<code>--mem</code>), and GPUs (<code>--gres=gpu</code>) in your submission arguments to meet your requirements.  For more information about submitting for GPU resources see [[SLURM/JobSubmission#Requesting_GPUs]].  You can also can run <code>man srun</code> on your submission node for a complete list of available submission arguments.


=== Interactive ===
=== Interactive ===
 
Once logged into a submission node, you can run simple interactive jobs.  If your session is interrupted from the submission node, the job will be killed.  As such, we encourage use of a terminal multiplexer such as [[Tmux]].
Once logged into a submission node you can run simple interactive jobs.  If your session is interrupted from the submission node the job will be killed so we encourage users to use a terminal multiplexer such as [[Tmux]].


<pre>
<pre>
Line 31: Line 27:


=== Batch ===
=== Batch ===
Batch jobs are scheduled with a script file with an optional ability to embed job scheduling parameters via variables that are defined by <code>#SBATCH</code> lines at the top of the file.  You can find some examples in our [[SLURM/JobSubmission]] documentation.
Batch jobs are scheduled with a script file with an optional ability to embed job scheduling parameters via variables that are defined by <code>#SBATCH</code> lines at the top of the file.  You can find some examples in our [[SLURM/JobSubmission]] documentation.


= Partitions =  
= Partitions =  
 
The SLURM resource manager uses partitions to act as job queues which can restrict size, time and user limits.  The Nexus (when fully operational) will have a number of different partitions of resources.  Different Centers, Labs, and Faculty will be able to invest in computational resources that will be restricted to approved users through these partitions.
The SLURM resource manager uses partitions to act as job queues which can restrict size, time and user limits.  The Nexus when fully operational will have a number of different partitions of resources.  Different Centers, Labs, and Faculty will be able to invest in computational resources that will be restricted to approved users through these partitions.
* [[Nexus/Tron]] - This is the pool of resources available to all UMIACS and CSD faculty and graduate students.  It provides access for undergraduate and graduate teaching resources.
 
* Scavenger - This is a [https://slurm.schedmd.com/preempt.html preemption] partition that supports nodes from multiple other partitions.  Jobs are subject to preemption rules however more resources are available to schedule simultaneously than in other partitionsYou are responsible for ensuring your jobs handle this preemption correctly, as the SLURM scheduler will simply restart each job with the same submission arguments when preempted jobs are available to run again.
* [[Nexus/Tron]] - This is currently the pool of resources available to all UMIACS and CSD faculty and graduate students.  It will provide access for undergraduate and graduate teaching resources.
 
* Scavenger - This is a [https://slurm.schedmd.com/preempt.html preemption] partition that supports nodes from multiple other partitions.  Jobs will be subject to preemption rules however more resources are available to users to schedule.  Jobs also have to handle this preemption correctly otherwise they will just be restarted from the beginning after they are re-queued and are then available to run again.


= Quality of Service (QoS) =
= Quality of Service (QoS) =
SLURM uses a QoS to provide limits on job sizes to users.  Note that users should still try to only allocate the minimum resources for their jobs as resources that your job schedules are counted against your FairShare priority in the future.
SLURM uses a QoS to provide limits on job sizes to users.  Note that you should still try to only allocate the minimum resources for your jobs, as resources that each of your jobs schedules are counted against your FairShare priority in the future.


* default - Default QoS which will limit users to 4 cores, 32GB RAM and 1 GPU.  The maximum wall time will be 3 days and users will be able to run up to 4 jobs.
* default - Default QoS. Limited to 4 cores, 32GB RAM and 1 GPU.  The maximum wall time per job is 3 days and 4 jobs are permitted simultaneously.
* medium - Limited to 8 cores, 64GB RAM, 2 GPUs.  The maximum wall time will be 2 days and users will be able to run 2 jobs.
* medium - Limited to 8 cores, 64GB RAM, 2 GPUs.  The maximum wall time per job is 2 days and 2 jobs are permitted simultaneously.
* high - Limited to 16 cores, 128GB RAM, 4 GPUs.  The maximum wall time will be 1 day and users will only be able to run 1 job.
* high - Limited to 16 cores, 128GB RAM, 4 GPUs.  The maximum wall time per job is 1 day and only 1 job is permitted simultaneously.
* scavenger - Limited to 64 cores, 256GB RAM and 8 GPUs.  The maximum wall time will be 2 days and users may only run 2 jobs.  This QoS is only available in the scavenger partition.
* scavenger - Limited to 64 cores, 256GB RAM and 8 GPUs.  The maximum wall time per job is 2 days. and users may only run 2 jobs.  This QoS is only available in the scavenger partition.


You can display these QoS from the command line using <code>show_qos</code> command.
You can display these QoSes from the command line using <code>show_qos</code> command.
<pre>
<pre>
# show_qos
# show_qos
Line 62: Line 54:
</pre>
</pre>


Currently in our non-preemption partition tron users will be restricted to 4 GPUs at once.
Currently in our non-preemption partition, you will be restricted to 4 GPUs at once.


To find out what accounts and partitions you have access to you use the <code>show_assoc</code> command.
To find out what accounts and partitions you have access to, use the <code>show_assoc</code> command.


= Storage =
= Storage =
All storage available in Nexus is currently NFS based.  We will be introducing some changes for Phase 2 to support high performance GPUDirect Storage (GDS).  These storage allocation procedures will be revised and approved by the launch of Phase 2 by a CSD and UMIACS faculty committee.
All storage available in Nexus is currently NFS based.  We will be introducing some changes for Phase 2 to support high performance GPUDirect Storage (GDS).  These storage allocation procedures will be revised and approved by the launch of Phase 2 by a CSD and UMIACS faculty committee.


Line 73: Line 64:
Each user account in UMIACS is allocated 20GB of storage in their home directory (/nfshomes/$username).  This file system has snapshots and backups available.  The quota is fixed however and is not available to increase.
Each user account in UMIACS is allocated 20GB of storage in their home directory (/nfshomes/$username).  This file system has snapshots and backups available.  The quota is fixed however and is not available to increase.


In phase2 other standalone compute clusters fold into partitions in Nexus you will start to have the same home directory across all systems.
In Phase 2, other standalone compute clusters will fold into partitions in Nexus and you will start to have the same home directory across all systems.


== Scratch Directories ==
== Scratch Directories ==
Each user will be allocated a 200GB scratch allocation under /fs/nexus-scratch/$username.  Once filled users may request an increase of up to 400GB.  This space is does not have snapshots and is not backed up.  Please ensure that any data you have under the scratch is reproducible.
Each user will be allocated a 200GB scratch directory under <code>/fs/nexus-scratch/$username</code>If your directory is completely filled, you may request a permanent increase of up to 400GB total.  This space is does not have snapshots and is not backed up.  Please ensure that any data you have under your scratch directory is reproducible.


== Faculty Allocations ==
== Faculty Allocations ==
Each faculty will have 1TB of lab space to be allocated to them when their account is installed.  We also can support grouping these individual allocations together into a larger center, lab or research group allocations if desired by the faculty.  Please contact staff@umiacs.umd.edu to inquire.
Each faculty will have 1TB of lab space to be allocated to them when their account is installed.  We also can support grouping these individual allocations together into larger center, lab, or research group allocations if desired by the faculty.  Please contact staff@umiacs.umd.edu to inquire.


This lab space will by default not have snapshots (but are available if requested) and it is backed up.
This lab space will by default not have snapshots (but are available if requested) and it is backed up.


== Project Allocations ==
== Project Allocations ==
Project allocations are available per user for 270 TB days.  Which means that you can have a 1TB allocation for up to 270 days, or a 3TB allocation for 90 days.  A single faculty member can not have more than 20 TB of sponsored account project allocations active at any point.  
Project allocations are available per user for 270 TB days; you can have a 1TB allocation for up to 270 days, a 3TB allocation for 90 days, etc..  A single faculty member can not have more than 20 TB of sponsored account project allocations active at any point.  


When requesting an allocation please CC your account sponsor when you send email to staff@umiacs.umd.edu.  Please include the following details:
To request an allocation, please send mail to staff@umiacs.umd.edu with your account sponsor CC'd.  Please include the following details:


* Project Name (short)
* Project Name (short)
* Description
* Description
* Size (1TB, 2TB, etc.)
* Size (1TB, 2TB, etc.)
* Length in days (180days)
* Length in days (270 days, 135 days, etc.)
 
These allocations will be available via /fs/nexus-projects/$project_name.


== Data Sets ==
These allocations will be available via <code>/fs/nexus-projects/$project_name</code>.


Data sets will be hosted in <code>/fs/nexus-datasets</code>.  If you want to request a data set for for consideration please email staff@umiacs.umd.edu.  We will have a more formal process to approve data sets by phase 2 of Nexus.  Please note that data sets that require accepting a license will need to be reviewed by ORA which may require some time to process.
== Datasets ==
Datasets are hosted in <code>/fs/nexus-datasets</code>.  If you want to request a dataset for for consideration, please email staff@umiacs.umd.edu.  We will have a more formal process to approve datasets by phase 2 of Nexus.  Please note that datasets that require accepting a license will need to be reviewed by [https://ora.umd.edu/ UMD's Office of Research Administration (ORA)] which may require some time to process.

Revision as of 19:03, 1 March 2022

The Nexus is the combined scheduler of resources in UMIACS. Many of our existing computational clusters that have discrete schedulers will be folding into this scheduler in the future. The resource manager for Nexus (as with our other existing computational clusters) is SLURM. Resources are arranged into partitions where users are able to schedule computational jobs. Users are arranged into a number of Slurm accounts based on faculty, lab, or center investments.

Getting Started

All accounts in UMIACS are sponsored. If you don't already have a UMIACS account, please see Nexus/Accounts for information on getting one.

Access

The submission nodes for the Nexus computational resources are determined by department, center or lab affiliation. You can log into the UMIACS Directory application and select their Computational Resources (CR). They will find a CR that has the prefix nexus, select it and in the Host section will list the available login nodes.

Note - UMIACS requires multi-factor authentication through our Duo instance. This is completely discrete from both UMD's and CSD's Duo instances. You will need to enroll one or more devices to access resources in UMIACS, and will be prompted to enroll when you log into the Directory application for the first time.

Once you have identified your submission nodes, you are able to SSH directly into them. From there, you are able to submit to the cluster via our SLURM workload manager. You need to make sure that your submitted jobs have the correct account, partition, and qos.

Jobs

SLURM jobs are submitted by either srun or sbatch depending if you are doing an interactive job or batch job, respectively. You need to provide the where/who the job will run and specify the resources you need to run with. There are defaults for both, so if you don't specify something you may be scheduled with a very minimal set of time and resources (including NO GPUs unless specifically requested).

For the where/who, you may be required to specify --account, --qos and/or --partition to be able to adequately submit jobs to the Nexus.

For resources, you may need to specify for cpus (--tasks), memory (--mem), and GPUs (--gres=gpu) in your submission arguments to meet your requirements. For more information about submitting for GPU resources, see SLURM/JobSubmission#Requesting_GPUs. You can also can run man srun on your submission node for a complete list of available submission arguments.

Interactive

Once logged into a submission node, you can run simple interactive jobs. If your session is interrupted from the submission node, the job will be killed. As such, we encourage use of a terminal multiplexer such as Tmux.

$ srun --pty --ntasks 4 --mem=2gb --gres=gpu:1 nvidia-smi -L
GPU 0: NVIDIA RTX A4000 (UUID: GPU-ae5dc1f5-c266-5b9f-58d5-7976e62b3ca1)

Batch

Batch jobs are scheduled with a script file with an optional ability to embed job scheduling parameters via variables that are defined by #SBATCH lines at the top of the file. You can find some examples in our SLURM/JobSubmission documentation.

Partitions

The SLURM resource manager uses partitions to act as job queues which can restrict size, time and user limits. The Nexus (when fully operational) will have a number of different partitions of resources. Different Centers, Labs, and Faculty will be able to invest in computational resources that will be restricted to approved users through these partitions.

  • Nexus/Tron - This is the pool of resources available to all UMIACS and CSD faculty and graduate students. It provides access for undergraduate and graduate teaching resources.
  • Scavenger - This is a preemption partition that supports nodes from multiple other partitions. Jobs are subject to preemption rules however more resources are available to schedule simultaneously than in other partitions. You are responsible for ensuring your jobs handle this preemption correctly, as the SLURM scheduler will simply restart each job with the same submission arguments when preempted jobs are available to run again.

Quality of Service (QoS)

SLURM uses a QoS to provide limits on job sizes to users. Note that you should still try to only allocate the minimum resources for your jobs, as resources that each of your jobs schedules are counted against your FairShare priority in the future.

  • default - Default QoS. Limited to 4 cores, 32GB RAM and 1 GPU. The maximum wall time per job is 3 days and 4 jobs are permitted simultaneously.
  • medium - Limited to 8 cores, 64GB RAM, 2 GPUs. The maximum wall time per job is 2 days and 2 jobs are permitted simultaneously.
  • high - Limited to 16 cores, 128GB RAM, 4 GPUs. The maximum wall time per job is 1 day and only 1 job is permitted simultaneously.
  • scavenger - Limited to 64 cores, 256GB RAM and 8 GPUs. The maximum wall time per job is 2 days. and users may only run 2 jobs. This QoS is only available in the scavenger partition.

You can display these QoSes from the command line using show_qos command.

# show_qos
      Name     MaxWall MaxJobs                        MaxTRES     MaxTRESPU   Priority
---------- ----------- ------- ------------------------------ ------------- ----------
 scavenger  2-00:00:00             cpu=64,gres/gpu=8,mem=256G   gres/gpu=16          0
    medium  2-00:00:00       2       cpu=8,gres/gpu=2,mem=64G                        0
      high  1-00:00:00       1     cpu=16,gres/gpu=4,mem=128G                        0
   default  3-00:00:00       4       cpu=4,gres/gpu=1,mem=32G                        0
      tron                                                       gres/gpu=4          0

Currently in our non-preemption partition, you will be restricted to 4 GPUs at once.

To find out what accounts and partitions you have access to, use the show_assoc command.

Storage

All storage available in Nexus is currently NFS based. We will be introducing some changes for Phase 2 to support high performance GPUDirect Storage (GDS). These storage allocation procedures will be revised and approved by the launch of Phase 2 by a CSD and UMIACS faculty committee.

Home Directories

Each user account in UMIACS is allocated 20GB of storage in their home directory (/nfshomes/$username). This file system has snapshots and backups available. The quota is fixed however and is not available to increase.

In Phase 2, other standalone compute clusters will fold into partitions in Nexus and you will start to have the same home directory across all systems.

Scratch Directories

Each user will be allocated a 200GB scratch directory under /fs/nexus-scratch/$username. If your directory is completely filled, you may request a permanent increase of up to 400GB total. This space is does not have snapshots and is not backed up. Please ensure that any data you have under your scratch directory is reproducible.

Faculty Allocations

Each faculty will have 1TB of lab space to be allocated to them when their account is installed. We also can support grouping these individual allocations together into larger center, lab, or research group allocations if desired by the faculty. Please contact staff@umiacs.umd.edu to inquire.

This lab space will by default not have snapshots (but are available if requested) and it is backed up.

Project Allocations

Project allocations are available per user for 270 TB days; you can have a 1TB allocation for up to 270 days, a 3TB allocation for 90 days, etc.. A single faculty member can not have more than 20 TB of sponsored account project allocations active at any point.

To request an allocation, please send mail to staff@umiacs.umd.edu with your account sponsor CC'd. Please include the following details:

  • Project Name (short)
  • Description
  • Size (1TB, 2TB, etc.)
  • Length in days (270 days, 135 days, etc.)

These allocations will be available via /fs/nexus-projects/$project_name.

Datasets

Datasets are hosted in /fs/nexus-datasets. If you want to request a dataset for for consideration, please email staff@umiacs.umd.edu. We will have a more formal process to approve datasets by phase 2 of Nexus. Please note that datasets that require accepting a license will need to be reviewed by UMD's Office of Research Administration (ORA) which may require some time to process.