Nexus/GAMMA: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
No edit summary
Line 6: Line 6:


=Quality of Service=
=Quality of Service=
The following QoS are available to this partition. Please run the <code>show_qos</code> command on a submission host to show the limits for these QoS.
GAMMA users have access to all of the [[Nexus#Quality_of_Service_.28QoS.29 | standard QoS']] in the <code>gamma</code> partition using the <code>gamma</code> account. There is one additional QoS: <code>huge-long</code> that allows for longer jobs using higher overall resources.
* default
* medium
* high
* huge-long


Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on gammagpu## nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use.


=Hardware=
=Hardware=

Revision as of 20:27, 5 June 2023

The GAMMA lab has a partition of GPU nodes available in the Nexus. Only GAMMA lab members are able to run non-interruptible jobs on these nodes.

Access

You can always find out what hosts you have access to submit via the Nexus#Access page. The GAMMA lab in particular has a special submission host that has additional local storage available.

  • nexusgamma00.umiacs.umd.edu

Quality of Service

GAMMA users have access to all of the standard QoS' in the gamma partition using the gamma account. There is one additional QoS: huge-long that allows for longer jobs using higher overall resources.

Please note that the partition has a GrpTRES limit of 100% of the available cores/RAM on the partition-specific nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use.

Hardware

Nodenames Type Quantity CPUs Memory GPUs
gammagpu[00-04] A5000 GPU Node 4 32 256GB 8
gammagpu05 A4000 GPU Node 1 32 256GB 8
Total 6 192 1536GB 48

Storage

Other than the default Nexus storage allocation(s), Gamma has invested in a 20TB NVMe scratch file system on nexusgamma00.umiacs.umd.edu that is available as /scratch1. To utilize this space, you will need to copy data from/to this over SSH from a compute node. To make this easier, you may want to setup SSH keys that will allow you to copy data without prompting for passwords.

This file system is not available over NFS and there are no backups or snapshots available for this file system. Please refer to our UNIX Local Storage page for more information.

Example

From nexusgamma00.umiacs.umd.edu you can run the following example to submit an interactive job. Please note that you need to specify the --account, --partition and --qos. Please refer to our SLURM documentation about about how to further customize your submissions including making a batch submission.

$ srun --pty --gres=gpu:8 --account=gamma --partition=gamma --qos=huge-long bash
$ hostname
gammagpu01.umiacs.umd.edu
$ nvidia-smi -L
GPU 0: NVIDIA RTX A5000 (UUID: GPU-cdfb2e0c-d69f-354b-02f4-15161dc7fa66)
GPU 1: NVIDIA RTX A5000 (UUID: GPU-be53e7a1-b8fd-7089-3cac-7a2fbf4ec7dd)
GPU 2: NVIDIA RTX A5000 (UUID: GPU-774efbb1-d7ec-a0bb-e992-da9d1fa6b193)
GPU 3: NVIDIA RTX A5000 (UUID: GPU-d1692181-c7de-e273-5f95-53ad381614c3)
GPU 4: NVIDIA RTX A5000 (UUID: GPU-ba51fd6c-37bf-1b95-5f68-987c18a6292a)
GPU 5: NVIDIA RTX A5000 (UUID: GPU-c1224a2a-4a3b-ff16-0308-4f36205b9859)
GPU 6: NVIDIA RTX A5000 (UUID: GPU-8d20d6cd-abf5-2630-ab88-6bba438c55fe)
GPU 7: NVIDIA RTX A5000 (UUID: GPU-93170910-5d94-6da5-8a24-f561d7da1e2d)