Nexus/GAMMA: Difference between revisions
m (Mbaney moved page Nexus/Gamma to Nexus/GAMMA without leaving a redirect) |
|||
Line 13: | Line 13: | ||
* <code>huge-long</code>: Allows for longer jobs using higher overall resources. | * <code>huge-long</code>: Allows for longer jobs using higher overall resources. | ||
Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes plus 50% of the available cores/RAM on legacy## nodes, so your job may need to wait if all available cores/RAM (or GPUs) are in use. | Please note that the partition has a <code>GrpTRES</code> limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use. | ||
=Hardware= | =Hardware= |
Revision as of 18:09, 19 September 2023
The GAMMA lab has a partition of GPU nodes available in the Nexus. Only GAMMA lab members are able to run non-interruptible jobs on these nodes.
Access
You can always find out what hosts you have access to submit via the Nexus#Access page. The GAMMA lab in particular has a special submission host that has additional local storage available.
nexusgamma00.umiacs.umd.edu
Please do not run anything on the login node. Always allocate yourself machines on the compute nodes (see instructions below) to run any job.
Quality of Service
GAMMA users have access to all of the standard job QoSes in the gamma
partition using the gamma
account.
The additional job QoSes for the GAMMA partition specifically are:
huge-long
: Allows for longer jobs using higher overall resources.
Please note that the partition has a GrpTRES
limit of 100% of the available cores/RAM on the partition-specific nodes in aggregate plus 50% of the available cores/RAM on legacy## nodes in aggregate, so your job may need to wait if all available cores/RAM (or GPUs) are in use.
Hardware
Nodenames | Type | Quantity | CPUs | Memory | GPUs |
---|---|---|---|---|---|
gammagpu[00-04,06-09] | A5000 GPU Node | 9 | 32 | 256GB | 8 |
gammagpu05 | A4000 GPU Node | 1 | 32 | 256GB | 8 |
Total | 10 | 320 | 2560GB | 80 |
Storage
Other than the default Nexus storage allocation(s), Gamma has invested in a 20TB NVMe scratch file system on nexusgamma00.umiacs.umd.edu
that is available as /scratch1
. To utilize this space, you will need to copy data from/to this over SSH from a compute node. To make this easier, you may want to setup SSH keys that will allow you to copy data without prompting for passwords.
This file system is not available over NFS and there are no backups or snapshots available for this file system. Please refer to our UNIX Local Storage page for more information.
Example
From nexusgamma00.umiacs.umd.edu
you can run the following example to submit an interactive job. Please note that you need to specify the --account
, --partition
and --qos
. Please refer to our SLURM documentation about about how to further customize your submissions including making a batch submission. The following command will allocate 8 GPUs for 2 days in an interactive session. Change parameters accordingly to your needs. We discourage use of srun and promote use of sbatch for fair use of GPUs.
$ srun --pty --gres=gpu:8 --account=gamma --partition=gamma --qos=huge-long bash $ hostname gammagpu01.umiacs.umd.edu $ nvidia-smi -L GPU 0: NVIDIA RTX A5000 (UUID: GPU-cdfb2e0c-d69f-354b-02f4-15161dc7fa66) GPU 1: NVIDIA RTX A5000 (UUID: GPU-be53e7a1-b8fd-7089-3cac-7a2fbf4ec7dd) GPU 2: NVIDIA RTX A5000 (UUID: GPU-774efbb1-d7ec-a0bb-e992-da9d1fa6b193) GPU 3: NVIDIA RTX A5000 (UUID: GPU-d1692181-c7de-e273-5f95-53ad381614c3) GPU 4: NVIDIA RTX A5000 (UUID: GPU-ba51fd6c-37bf-1b95-5f68-987c18a6292a) GPU 5: NVIDIA RTX A5000 (UUID: GPU-c1224a2a-4a3b-ff16-0308-4f36205b9859) GPU 6: NVIDIA RTX A5000 (UUID: GPU-8d20d6cd-abf5-2630-ab88-6bba438c55fe) GPU 7: NVIDIA RTX A5000 (UUID: GPU-93170910-5d94-6da5-8a24-f561d7da1e2d)
You can also use SBATCH to submit your job. Here are two examples on how to do that.
$ sbatch --pty --gres=gpu:8 --account=gamma --partition=gamma --qos=huge-long --time=1-23:00:00 script.sh
OR
$ sbatch script.sh // script.sh // #!/bin/bash #SBATCH --gres=gpu:8 #SBATCH --account=gamma #SBATCH --partition=gamma #SBATCH --qos=huge-long #SBATCH --time=1-23:00:00 python your_file.py