Nexus

From UMIACS
Jump to navigation Jump to search

The Nexus is the combined scheduler of resources in UMIACS. Many of our existing computational clusters that are discrete will be folding into this scheduler. The resource manager for this is SLURM and resources will be arranged into partitions of resources where users will be able to schedule computational jobs. Users will be arranged into a number of Slurm accounts based on faculty, lab or center investments.

Getting Started

All accounts in UMIACS are sponsored. If you don't already have a UMIACS account please see Nexus/Accounts for information on getting one.

To use Nexus users will need to have an Account in UMIACS and you will need to follow the directions in the Accounts section. If you already have a UMIACS account then you can skip to the Access section.


Access

The submission nodes for the Nexus computational resources are determined by department, center or lab affiliation. Users can log into the UMIACS Directory application and select their Computational Resources (CR). They will find a CR that has the prefix nexus and select it to list their available login nodes.

Note - UMIACS requires multi-factor authentication through our Duo instance. This is completely discrete from both UMD and/or CSD Duo instances and users will need to enroll device(s) to access resources in UMIACS. Users will be prompted when they log into the Directory application the first time.

Once users have identified their submission nodes they will be able to SSH directly into them. From there users will be able to submit to the cluster via our SLURM workload manager. Users will need to make sure that they submit jobs with correct account, partition and qos.

Partitions

The Nexus when fully operational will have a number of different partitions of resources. Different Centers, Labs, and Faculty will be able to invest in computational resources that will restricted to approved users through these partitions.

  • Nexus/Tron - This is currently the pool of resources available to all UMIACS and CSD faculty and graduate students. It will provide access for undergraduate and graduate teaching resources.
  • Scavenger - This partition is available as a

Quality of Service (QoS)

SLURM uses a QoS to provide limits on job sizes to users. Note that users should still try to only allocate the minimum resources for their jobs as resources that your job schedules are counted against your FairShare priority in the future.

  • normal - Default QoS which will limit users to 4 cores, 32GB RAM and 1 GPU. The maximum wall time will be 3 days and users will be able to run up to 4 jobs.
  • medium - Limited to 8 cores, 64GB RAM, 2 GPUs. The maximum wall time will be 2 days and users will be able to run 2 jobs.
  • high - Limited to 16 cores, 128GB RAM, 4 GPUs. The maximum wall time will be 1 day and users will only be able to run 1 job.
  • scavenger - Limited to 64 cores, 256GB RAM and 8 GPUs. The maximum wall time will be 2 days and users may only run 2 jobs. This QoS is only available in the scavenger partition.

To find out what accounts and partitions you have access to you use the show_assoc command.

Storage

All storage available in Nexus is NFS based. These storage allocation procedures will be revised and approved by the launch of Phase 2 by a CSD and UMIACS faculty committee.

Home Directories

Each user account in UMIACS is allocated 20GB of storage in their home directory (/nfshomes/$username). This file system has snapshots and backups available. The quota is fixed however and is not available to increase.

In phase2 other standalone compute clusters fold into partitions in Nexus you will start to have the same home directory across all systems.

Scratch Directories

Each user will be allocated a 200GB scratch allocation under /fs/nexus-scratch/$username. Once filled users may request an increase of up to 400GB. This space is does not have snapshots and is not backed up. Please ensure that any data you have under the scratch is reproducible.

Faculty Allocations

Each faculty will have 1TB of lab space to be allocated to them when their account is installed. We also can support grouping these individual allocations together into a larger center, lab or research group allocations if desired by the faculty. Please contact staff@umiacs.umd.edu to inquire.

This lab space will by default not have snapshots (but are available if requested) and it is backed up.

Project Allocations

Project allocations are available per user for 270 TB days. Which means that you can have a 1TB allocation for up to 270 days, or a 3TB allocation for 90 days. A single faculty member can not have more than 20 TB of sponsored account project allocations active at any point.

When requesting an allocation please CC your account sponsor when you send email to staff@umiacs.umd.edu. Please include the following details:

  • Project Name (short)
  • Description
  • Size (1TB, 2TB, etc.)
  • Length in days (180days)

These allocations will be available via /fs/nexus-projects/$project_name.

Data Sets

Data sets will be hosted in /fs/nexus-datasets. If you want to request a data set for for consideration please email staff@umiacs.umd.edu. We will have a more formal process to approve data sets by phase 2 of Nexus. Please note that data sets that require accepting a license will need to be reviewed by ORA which may require some time to process.