Vulcan

From cfar
Jump to navigation Jump to search

There are two submission nodes that are available to do data management and submit jobs to the cluster. If you are not connected on the UMIACS network via a wired connection, you will first need to either SSH to our OpenLAB or connect to our VPN.

  • vulcansub00.umiacs.umd.edu
  • vulcansub01.umiacs.umd.edu

Please do not run CPU intensive jobs on the submission nodes. This will slow down cluster job submission for other users. UMIACS Staff will terminate any CPU intensive job running on a submission node.

There are currently 25 GPU nodes available with 196 total GPUs (a mixture of NVIDIA RTX A6000, NVIDIA Quadro P6000, NVIDIA GeForce GTX 1080 Ti, NVIDIA GeForce RTX 2080 Ti, and NVIDIA Tesla P100 cards) available in the cluster. There are also 14 CPU-only nodes available. All nodes are scheduled with the SLURM resource manager.

There are two partitions to submit jobs into:

  • dpart: a priority queue where users can run jobs with access to a number of GPUs over time period with guaranteed execution
  • scavenger: a lesser priority queue where users can run jobs with access to a larger number of GPUs, but jobs will be preempted if resources are needed for jobs in the dpart queue.

You will need to specify a QOS (quality of service) when you submit your job(s) to either partition. In dpart, these will give you the ability to choose the maximum number of GPUs and wall time.

Updates

Notice: Changes as of Tuesday November 30th 2021

There are new scheduler changes that prioritize faculty contributions to the Vulcan cluster.

Please note that we have changed the default account from cluster_vulcan to vulcan. If you have this hard coded in your submission scripts then you will need to update this.

Faculty Accounts

Users with a faculty sponsor who has contributed to the cluster should now use the --account flag with their faculty member's UMIACS username in order to receive the full benefits of the cluster. The show_assoc command will display all of your available associations. For example, for the abhinav group:

$ show_assoc
      User          Account       GrpTRES                                  QOS
---------- ---------------- ------------- ------------------------------------
  username           vulcan                       cpu,default,medium,scavenger
  username          abhinav                  cpu,default,high,medium,scavenger

$ srun --pty --cpus-per-task=4 --gres=gpu:2 --account=abhinav --qos=high nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-b62cb0bd-6ed2-8cfd-4943-49992751b693)
GPU 1: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-cbbc95e4-d7cb-f13a-c4f3-a1087680527e)

The current faculty accounts are:

  • abhinav
  • djacobs
  • lsd
  • metzler
  • rama
  • ramani
  • yaser
  • zwicker

Faculty can manage this list of users via our Directory application in the Security Groups section. The security group that controls access has the prefix vulcan_ and then the faculty username. It will also list slurm://vulcanslurmctl.umiacs.umd.edu as the associated URI.

Non-Contributing Accounts

For users who do not belong to one of these faculty groups, there are still changes that will impact you.

  • You no longer need to specify --account=scavenger when submitting to the scavenger partition - this is now assumed to be the default. You can change the account used to --account=vulcan if you would like. You do, however, now need to specify --qos=scavenger when submitting to the scavenger partition.
  • Users no longer have access to the high QoS in dpart.
  • There is now a concurrent limit of 48 total GPUs for all users not in a contributing faculty group.

Getting Started

Software