SLURM/Priority: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
No edit summary
Line 16: Line 16:


==Fair-share==
==Fair-share==
The more resources your jobs have already consumed within an account, the lower priority factor your future jobs will have when compared to other users' jobs in the same account who have used fewer resources. Additionally, if there are multiple accounts that can submit to a partition, and the sum of resources of all users' jobs within account A is greater than the sum of resources of all users' jobs within account B, the lower priority factor all future jobs from users in account A will have when compared to all future jobs from users in account B.
The more resources your jobs have already consumed within an account, the lower priority factor your future jobs will have when compared to other users' jobs in the same account who have used fewer resources (so as to "fair-share" with other users). Additionally, if there are multiple accounts that can submit to a partition, and the sum of resources of all users' jobs within account A is greater than the sum of resources of all users' jobs within account B, the lower priority factor all future jobs from users in account A will have when compared to all future jobs from users in account B.


The actual resource weightings for the three main resources (memory per GB, CPU cores, and GPUs if applicable) are per-partition and can be viewed in the <code>TRESBillingWeights</code> line in the output of <code>scontrol show partition</code>. There are two main algorithms we use for weightings, per cluster:
The actual resource weightings for the three main resources (memory per GB, CPU cores, and GPUs if applicable) are per-partition and can be viewed in the <code>TRESBillingWeights</code> line in the output of <code>scontrol show partition</code>. There are two main algorithms we use for weightings, per cluster:
Line 25: Line 25:
* [[Nexus]] (after 2/23/2023)
* [[Nexus]] (after 2/23/2023)


The relative billing for each resource is equal to the other resources' relative billings within each partition on the cluster (33.33% for each resource if a partition has GPUs, or 50% for each resource if not). However, if at least one account that can submit to a CPU-only partition can also submit to a partition with GPUs on the same cluster, the final CPU billing is divided by 10.
Resource have algorithmically computed floating point billing values, per partition.
 
====GPU-capable partitions====
Each resource (memory/CPU/GPU) is given weighting such that their relative billings to each other are equal (33.33% each) and rounded to whole numbers. In practice, this always results in memory having a weighting value of 1.0 and CPU/GPU being adjusted accordingly. Different GPU types may also weighted differently within the GPU relative billing. A baseline GPU is first chosen for each cluster and all GPUs of that type and other types that have lower FP32 performance are given a weighting factor of 1.0. GPUs with higher FP32 performance than the baseline GPU...
 
The current baseline GPUs per cluster are:
* CML (after 2/23/2023): NVIDIA RTX A4000
* Nexus (after 2/23/2023): NVIDIA RTX A4000
 
====CPU-only partitions====
Each resource (memory/CPU) is first given weighting such that the relative billings to each other are equal (50% each). The final CPU weight value is then divided by 10, which ends up translating to roughly 90.9% of the billing weight being for memory and 9.1% being for CPU. This is done so as to not affect accounts' fair-share priority factors as much when running CPU-only jobs given the popularity of GPU computing.


More details forthcoming.
More details forthcoming.
Line 32: Line 42:
This weighting algorithm is currently in use on all clusters not mentioned in the previous section. These clusters will eventually either fold into [[Nexus]] or have the modern algorithm introduced in the future.
This weighting algorithm is currently in use on all clusters not mentioned in the previous section. These clusters will eventually either fold into [[Nexus]] or have the modern algorithm introduced in the future.


Resources have floating point billing values and are fixed. Values:
Resources have fixed floating point billing values, per partition.
* GPU-capable partitions: Memory is billed at 0.125 per GB, CPU is billed at 1.0 per core, and GPU is billed at 4.0 per card.
 
* CPU-only partitions: Memory is billed at 0.125 per GB and CPU is billed at 0.1 per core.
====GPU-capable partitions====
Memory is billed at 0.125 per GB, CPU is billed at 1.0 per core, and GPU is billed at 4.0 per card.
 
====CPU-only partitions====
Memory is billed at 0.125 per GB and CPU is billed at 0.1 per core. The lower CPU weighting is done so as to not affect accounts' fair-share priority factors as much when running CPU-only jobs given the popularity of GPU computing.


==Nice value==
==Nice value==

Revision as of 15:56, 30 January 2023

SLURM at UMIACS is configured to prioritize jobs based on a number of factors, termed multifactor priority in SLURM.

These factors include:

  • Age of job i.e. time spent waiting to run in the queue
  • Partition job was submitted to
  • Fair-share of resources
  • "Nice" value that job was submitted with

Age

The longer a job is eligible to run but cannot due to all available resources being taken up increases the job's priority to be scheduled as time goes on. The priority modifier for this factor reaches its limit after 7 days.

Partition

The partition named scavenger on each of our clusters always has a lower priority factor for its jobs than all other partitions on that cluster. As mentioned in other UMIACS cluster-specific documentation, jobs submitted to this partition are also preemptable. These two design choices give the partition its name; jobs submitted to the scavenger partition "scavenge" for available resources on the cluster rather than consume a dedicated chunk of resources and are interrupted by jobs seeking to consume dedicated chunks.

All other partitions on our clusters have the same priority factor.

Fair-share

The more resources your jobs have already consumed within an account, the lower priority factor your future jobs will have when compared to other users' jobs in the same account who have used fewer resources (so as to "fair-share" with other users). Additionally, if there are multiple accounts that can submit to a partition, and the sum of resources of all users' jobs within account A is greater than the sum of resources of all users' jobs within account B, the lower priority factor all future jobs from users in account A will have when compared to all future jobs from users in account B.

The actual resource weightings for the three main resources (memory per GB, CPU cores, and GPUs if applicable) are per-partition and can be viewed in the TRESBillingWeights line in the output of scontrol show partition. There are two main algorithms we use for weightings, per cluster:

Modern

This weighting algorithm is soon to be in use on the following clusters:

  • CML (after 2/23/2023)
  • Nexus (after 2/23/2023)

Resource have algorithmically computed floating point billing values, per partition.

GPU-capable partitions

Each resource (memory/CPU/GPU) is given weighting such that their relative billings to each other are equal (33.33% each) and rounded to whole numbers. In practice, this always results in memory having a weighting value of 1.0 and CPU/GPU being adjusted accordingly. Different GPU types may also weighted differently within the GPU relative billing. A baseline GPU is first chosen for each cluster and all GPUs of that type and other types that have lower FP32 performance are given a weighting factor of 1.0. GPUs with higher FP32 performance than the baseline GPU...

The current baseline GPUs per cluster are:

  • CML (after 2/23/2023): NVIDIA RTX A4000
  • Nexus (after 2/23/2023): NVIDIA RTX A4000

CPU-only partitions

Each resource (memory/CPU) is first given weighting such that the relative billings to each other are equal (50% each). The final CPU weight value is then divided by 10, which ends up translating to roughly 90.9% of the billing weight being for memory and 9.1% being for CPU. This is done so as to not affect accounts' fair-share priority factors as much when running CPU-only jobs given the popularity of GPU computing.

More details forthcoming.

Legacy

This weighting algorithm is currently in use on all clusters not mentioned in the previous section. These clusters will eventually either fold into Nexus or have the modern algorithm introduced in the future.

Resources have fixed floating point billing values, per partition.

GPU-capable partitions

Memory is billed at 0.125 per GB, CPU is billed at 1.0 per core, and GPU is billed at 4.0 per card.

CPU-only partitions

Memory is billed at 0.125 per GB and CPU is billed at 0.1 per core. The lower CPU weighting is done so as to not affect accounts' fair-share priority factors as much when running CPU-only jobs given the popularity of GPU computing.

Nice value

This is a submission argument that you as the user can include when submitting your jobs to deprioritize them. Larger values will deprioritize jobs e.g.,

srun --pty --qos=default --mem 1gb --time=01:00:00 --nice=2 bash

will have lower priority than

srun --pty --qos=default --mem 1gb --time=01:00:00 --nice=1 bash

which will have lower priority than

srun --pty --qos=default --mem 1gb --time=01:00:00 bash

assuming all three jobs were submitted at the same time. You cannot use negative values for this argument.