SLURM/JobStatus: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
 
(26 intermediate revisions by 3 users not shown)
Line 4: Line 4:
==squeue==
==squeue==
The squeue command shows job status in the queue. Helpful flags:
The squeue command shows job status in the queue. Helpful flags:
* <code>-u username</code> to show only your jobs (replace username with your UMIACS username)
* <code>-u username</code> to show only your jobs (replace <tt>username</tt> with your UMIACS username)
* <code>--start</code> to estimate start time for a job that has not yet started and the reason why it is waiting
* <code>--start</code> to estimate start time for a job that has not yet started and the reason why it is waiting
* <code>-s</code> to show the status of individual job steps for a job (e.g. batch jobs)
* <code>-s</code> to show the status of individual job steps for a job (e.g. batch jobs)
Line 10: Line 10:
Examples:
Examples:
<pre>
<pre>
username@opensub00:squeue -u username
[username@nexusclip00 ~]$ squeue -u username
             JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
             JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
               162     test2 helloWor username  R      0:03      2 openlab[00-01]
               162     tron helloWor username  R      0:03      2 tron[00-01]
</pre>
</pre>


<pre>
<pre>
username@opensub00:squeue --start -u username
[username@nexusclip00 ~]$ squeue --start -u username
             JOBID PARTITION    NAME    USER ST          START_TIME  NODES SCHEDNODES          NODELIST(REASON)
             JOBID PARTITION    NAME    USER ST          START_TIME  NODES SCHEDNODES          NODELIST(REASON)
               163     test2 helloWo2 username PD 2020-05-11T18:36:49      1 openlab02            (Priority)
               163     tron helloWo2 username PD 2020-05-11T18:36:49      1 tron02              (Priority)
</pre>
</pre>


<pre>
<pre>
username@opensub00:squeue -s -u username
[username@nexusclip00 ~]$ squeue -s -u username
         STEPID    NAME PARTITION    USER      TIME NODELIST
         STEPID    NAME PARTITION    USER      TIME NODELIST
           162.0    sleep     test2 username      0:05 openlab00
           162.0    sleep     tron username      0:05 tron00
           162.1    sleep     test2 username      0:05 openlab01
           162.1    sleep     tron username      0:05 tron01
</pre>
</pre>


Line 34: Line 34:
</pre>
</pre>
<pre>
<pre>
username@opensub00: sstat --format JobID,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize 171
[username@nexusclip00 ~]$ sstat --format JobID,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize 171
       JobID  NTasks            Nodelist    MaxRSS  MaxVMSize    AveRSS  AveVMSize  
       JobID  NTasks            Nodelist    MaxRSS  MaxVMSize    AveRSS  AveVMSize  
------------ -------- -------------------- ---------- ---------- ---------- ----------  
------------ -------- -------------------- ---------- ---------- ---------- ----------  
171.0              1           openlab00         0    186060K          0    107900K  
171.0              1               tron00         0    186060K          0    107900K  
username@opensub00: sstat --format JobID,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize 171.1
[username@nexusclip00 ~]$ sstat --format JobID,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize 171.1
       JobID  NTasks            Nodelist    MaxRSS  MaxVMSize    AveRSS  AveVMSize  
       JobID  NTasks            Nodelist    MaxRSS  MaxVMSize    AveRSS  AveVMSize  
------------ -------- -------------------- ---------- ---------- ---------- ----------  
------------ -------- -------------------- ---------- ---------- ---------- ----------  
171.1              1           openlab01         0    186060K          0    107900K  
171.1              1               tron01         0    186060K          0    107900K  
</pre>
</pre>
Note that if you do not have any jobsteps, sstat will return an error.
Note that if you do not have any jobsteps, sstat will return an error.
<pre>
<pre>
username@opensub00: sstat --format JobID,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize 172
[username@nexusclip00 ~]$ sstat --format JobID,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize 172
       JobID  NTasks            Nodelist    MaxRSS  MaxVMSize    AveRSS  AveVMSize  
       JobID  NTasks            Nodelist    MaxRSS  MaxVMSize    AveRSS  AveVMSize  
------------ -------- -------------------- ---------- ---------- ---------- ----------
------------ -------- -------------------- ---------- ---------- ---------- ----------
Line 66: Line 66:
The sacct command shows metrics from past jobs.
The sacct command shows metrics from past jobs.
<pre>
<pre>
username@opensub00:sacct
[username@nexusclip00 ~]$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode  
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode  
------------ ---------- ---------- ---------- ---------- ---------- --------  
------------ ---------- ---------- ---------- ---------- ---------- --------  
162          helloWorld      test2      staff         2  COMPLETED      0:0  
162          helloWorld       tron     nexus         2  COMPLETED      0:0  
162.batch        batch                staff         1  COMPLETED      0:0  
162.batch        batch                nexus         1  COMPLETED      0:0  
162.0            sleep                staff         1  COMPLETED      0:0  
162.0            sleep                nexus         1  COMPLETED      0:0  
162.1            sleep                staff         1  COMPLETED      0:0  
162.1            sleep                nexus         1  COMPLETED      0:0  
163          helloWorld      test2      staff         2  COMPLETED      0:0  
163          helloWorld       tron     nexus         2  COMPLETED      0:0  
163.batch        batch                staff         1  COMPLETED      0:0  
163.batch        batch                nexus         1  COMPLETED      0:0  
163.0            sleep                staff         1  COMPLETED      0:0  
163.0            sleep                nexus         1  COMPLETED      0:0  
</pre>
</pre>
To check one specific job, you can run something like the following (if you omit .<$JOBSTEP>, all jobsteps will be shown):
To check one specific job, you can run something like the following (if you omit .<$JOBSTEP>, all jobsteps will be shown):
<pre>sacct  --format JobID,jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize,Elapsed -j <$JOBID>.<$JOBSTEP></pre>
<pre>sacct  --format JobID,jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize,Elapsed -j <$JOBID>.<$JOBSTEP></pre>
<pre>
<pre>
username@opensub00:sacct --format JobID,jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize,Elapsed -j 171
[username@nexusclip00 ~]$ sacct --format JobID,jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize,Elapsed -j 171
       JobID    JobName  NTasks        NodeList    MaxRSS  MaxVMSize    AveRSS  AveVMSize    Elapsed  
       JobID    JobName  NTasks        NodeList    MaxRSS  MaxVMSize    AveRSS  AveVMSize    Elapsed  
------------ ---------- -------- --------------- ---------- ---------- ---------- ---------- ----------  
------------ ---------- -------- --------------- ---------- ---------- ---------- ---------- ----------  
171          helloWorld           openlab[00-01]                                              00:00:30  
171          helloWorld             tron[00-01]                                              00:00:30  
171.batch        batch        1       openlab00         0    119784K          0    113120K  00:00:30  
171.batch        batch        1         tron00         0    119784K          0    113120K  00:00:30  
171.0            sleep        1       openlab00         0    186060K          0    107900K  00:00:30  
171.0            sleep        1         tron00         0    186060K          0    107900K  00:00:30  
171.1            sleep        1       openlab01         0    186060K          0    107900K  00:00:30  
171.1            sleep        1         tron01         0    186060K          0    107900K  00:00:30  
</pre>
</pre>


=Job Codes=
=Job Codes=
When you list the current running jobs and your job is in <code>PD</code> (Pending), SLURM will provide you some information on what the reason for this in the NODELIST parameter.  You can use <code>scontrol show job <jobid></code> to get all the parameters for your job which may be required to identify why your job is not running.
If you list the current running jobs and your job is in <code>PD</code> (Pending), SLURM will provide you some information on what the reason for this in the NODELIST parameter.  You can use <code>scontrol show job <jobid></code> to get all the parameters for your job to help identify why your job is not running.


<pre>
<pre>
# squeue -u testuser
[username@nexusclip00 ~]$ squeue -u username
JOBID PARTITION    NAME    USER    ST      TIME  NODES NODELIST(REASON)
JOBID PARTITION    NAME    USER    ST      TIME  NODES NODELIST(REASON)
581530     dpart     bash    testuser PD      0:00      1 (AssocGrpGRES)
1          tron     bash     username PD      0:00      1 (AssocGrpGRES)
581533     dpart     bash    testuser PD      0:00      1 (Resources)
2          tron     bash     username PD      0:00      1 (Resources)
581534     dpart     bash   testuser PD      0:00      1 (QOSMaxGRESPerUser)
3          tron     bash    username PD      0:00      1 (Priority)
581535 scavenger     bash   testuser PD      0:00      1 (ReqNodeNotAvail, Reserved for maintenance)
4          tron     bash     username PD      0:00      1 (QOSMaxGRESPerUser)
5          tron     bash     username PD      0:00      1 (ReqNodeNotAvail, Reserved for maintenance)
</pre>
</pre>


Some common ones are as follows:
Some common ones are as follows:
* <code>Resources</code> - The cluster does not currently have the resources to fit your job.
* <code>Resources</code> - The cluster does not currently have the resources to fit your job in your selected partition.
* <code>QOSMaxGRESPerUser</code> - The quality of service (QoS) your job is running in has a limit of resources per user.  Use <code>show_qos</code> to identify the limits and then use <code>scontrol show job <jobid></code> for each of your jobs running in that QoS.
* <code>Priority</code> - The cluster has reserved resources for higher [[SLURM/Priority | priority]] jobs in your selected partition.
* <code>AssocGrpGRES</code> - The SLURM account you are using has a limit on the resources available in total for the account.  Use <code>sacctmgr show assoc account=<account_name></code> to identify the GrpTRES limit.  You can see all jobs running under the account by running <code>squeue -A account_name</code> and then find out more information on each job by <code>scontrol show job <jobid></code>.
* <code>QOSMax*PerUser</code> or <code>QOSMax*PerUserLimit</code> - The quality of service (QoS) your job is requesting to use has some limit per user (CPU, mem, GRES, etc.).  Use <code>show_qos</code> and <code>show_partition_qos</code> to identify the limit(s) and then use <code>scontrol show job <jobid></code> for each of your jobs running in that QoS to see the resources they are currently consuming.
* <code>ReqNodeNotAvail</code> - If you have requested a specific node and it is currently scheduled you can get this job codeYou can also get this job code when it provides <code>Reserved for maintenance</code> that there is a reservation in place (often for a [[MonthlyMaintenanceWindow | maintenance window]]).  You can see the current reservations by running <code>scontrol show reservation</code>.  Often the culprit is that you have requested a TimeLimit that will conflict with the reservation.  You can either lower your TimeLimit so that the job will complete before the reservation begins, or leave your job to wait until the reservation completes.
* <code>AssocGrpBilling</code> - The SLURM account you are using has a limit on the overall billing amount available in total for the account.  Use <code>sacctmgr show assoc account=<accountname> where user=</code> to identify the limit, replacing <tt><accountname></tt> with the account you are submitting your job with.  You can see all jobs running under the account and their billing values by running <code>squeue -A <accountname> -O "JobId:.18 ,Partition:.9 ,Name:.8 ,UserName:.8 ,StateCompact:.2 ,TimeUsed:.10 ,NumNodes:.6 ,ReasonList:45 ,tres-alloc:80"</code>. The billing value will be part of the <tt>tres-alloc</tt> string for each job.
* <code>ReqNodeNotAvail</code> - None of the nodes that could run your job (based on requested partition/resources) currently have the resources to fit your job.  Alternatively, if you also see <code>Reserved for maintenance</code>, there is a reservation in place (often for a [[MonthlyMaintenanceWindow | maintenance window]]).  You can see the current reservations by running <code>scontrol show reservation</code>.  Often the culprit is that you have requested a TimeLimit that will conflict with the reservation.  You can either lower your TimeLimit such that the job will complete before the reservation begins, or leave your job to wait until the reservation completes.
 
SLURM's full list of reasons/explanations can be found [https://slurm.schedmd.com/job_reason_codes.html here].

Latest revision as of 19:12, 22 August 2024

Job Status

SLURM offers a variety of tools to check the status of your jobs before, during, and after execution. When you first submit your job, SLURM should give you a job ID which represents the resources allocated to your job. Individual calls to srun will spawn job steps which can also be queried individually.

squeue

The squeue command shows job status in the queue. Helpful flags:

  • -u username to show only your jobs (replace username with your UMIACS username)
  • --start to estimate start time for a job that has not yet started and the reason why it is waiting
  • -s to show the status of individual job steps for a job (e.g. batch jobs)

Examples:

[username@nexusclip00 ~]$ squeue -u username
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               162      tron helloWor username  R       0:03      2 tron[00-01]
[username@nexusclip00 ~]$ squeue --start -u username
             JOBID PARTITION     NAME     USER ST          START_TIME  NODES SCHEDNODES           NODELIST(REASON)
               163      tron helloWo2 username PD 2020-05-11T18:36:49      1 tron02               (Priority)
[username@nexusclip00 ~]$ squeue -s -u username
         STEPID     NAME PARTITION     USER      TIME NODELIST
          162.0    sleep      tron username      0:05 tron00
          162.1    sleep      tron username      0:05 tron01

sstat

The sstat command shows metrics from currently running job steps. If you don't specify a job step, the lowest job step is displayed.

sstat --format JobID,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize <$JOBID>.<$JOBSTEP>
[username@nexusclip00 ~]$ sstat --format JobID,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize 171
       JobID   NTasks             Nodelist     MaxRSS  MaxVMSize     AveRSS  AveVMSize 
------------ -------- -------------------- ---------- ---------- ---------- ---------- 
171.0               1               tron00          0    186060K          0    107900K 
[username@nexusclip00 ~]$ sstat --format JobID,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize 171.1
       JobID   NTasks             Nodelist     MaxRSS  MaxVMSize     AveRSS  AveVMSize 
------------ -------- -------------------- ---------- ---------- ---------- ---------- 
171.1               1               tron01          0    186060K          0    107900K 

Note that if you do not have any jobsteps, sstat will return an error.

[username@nexusclip00 ~]$ sstat --format JobID,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize 172
       JobID   NTasks             Nodelist     MaxRSS  MaxVMSize     AveRSS  AveVMSize 
------------ -------- -------------------- ---------- ---------- ---------- ----------
sstat: error: no steps running for job 237

If you do not run any srun commands, you will not create any job steps and metrics will not be available for your job. Your batch scripts should follow this format:

#!/bin/bash
#SBATCH ...
#SBATCH ...
# set environment up
module load ...

# launch job steps
srun <command to run> # that would be step 1
srun <command to run> # that would be step 2

sacct

The sacct command shows metrics from past jobs.

[username@nexusclip00 ~]$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
162          helloWorld       tron      nexus          2  COMPLETED      0:0 
162.batch         batch                 nexus          1  COMPLETED      0:0 
162.0             sleep                 nexus          1  COMPLETED      0:0 
162.1             sleep                 nexus          1  COMPLETED      0:0 
163          helloWorld       tron      nexus          2  COMPLETED      0:0 
163.batch         batch                 nexus          1  COMPLETED      0:0 
163.0             sleep                 nexus          1  COMPLETED      0:0 

To check one specific job, you can run something like the following (if you omit .<$JOBSTEP>, all jobsteps will be shown):

sacct  --format JobID,jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize,Elapsed -j <$JOBID>.<$JOBSTEP>
[username@nexusclip00 ~]$ sacct --format JobID,jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize,Elapsed -j 171
       JobID    JobName   NTasks        NodeList     MaxRSS  MaxVMSize     AveRSS  AveVMSize    Elapsed 
------------ ---------- -------- --------------- ---------- ---------- ---------- ---------- ---------- 
171          helloWorld              tron[00-01]                                               00:00:30 
171.batch         batch        1          tron00          0    119784K          0    113120K   00:00:30 
171.0             sleep        1          tron00          0    186060K          0    107900K   00:00:30 
171.1             sleep        1          tron01          0    186060K          0    107900K   00:00:30 

Job Codes

If you list the current running jobs and your job is in PD (Pending), SLURM will provide you some information on what the reason for this in the NODELIST parameter. You can use scontrol show job <jobid> to get all the parameters for your job to help identify why your job is not running.

[username@nexusclip00 ~]$ squeue -u username
JOBID  PARTITION     NAME     USER     ST       TIME  NODES NODELIST(REASON)
1           tron     bash     username PD       0:00      1 (AssocGrpGRES)
2           tron     bash     username PD       0:00      1 (Resources)
3           tron     bash     username PD       0:00      1 (Priority)
4           tron     bash     username PD       0:00      1 (QOSMaxGRESPerUser)
5           tron     bash     username PD       0:00      1 (ReqNodeNotAvail, Reserved for maintenance)

Some common ones are as follows:

  • Resources - The cluster does not currently have the resources to fit your job in your selected partition.
  • Priority - The cluster has reserved resources for higher priority jobs in your selected partition.
  • QOSMax*PerUser or QOSMax*PerUserLimit - The quality of service (QoS) your job is requesting to use has some limit per user (CPU, mem, GRES, etc.). Use show_qos and show_partition_qos to identify the limit(s) and then use scontrol show job <jobid> for each of your jobs running in that QoS to see the resources they are currently consuming.
  • AssocGrpBilling - The SLURM account you are using has a limit on the overall billing amount available in total for the account. Use sacctmgr show assoc account=<accountname> where user= to identify the limit, replacing <accountname> with the account you are submitting your job with. You can see all jobs running under the account and their billing values by running squeue -A <accountname> -O "JobId:.18 ,Partition:.9 ,Name:.8 ,UserName:.8 ,StateCompact:.2 ,TimeUsed:.10 ,NumNodes:.6 ,ReasonList:45 ,tres-alloc:80". The billing value will be part of the tres-alloc string for each job.
  • ReqNodeNotAvail - None of the nodes that could run your job (based on requested partition/resources) currently have the resources to fit your job. Alternatively, if you also see Reserved for maintenance, there is a reservation in place (often for a maintenance window). You can see the current reservations by running scontrol show reservation. Often the culprit is that you have requested a TimeLimit that will conflict with the reservation. You can either lower your TimeLimit such that the job will complete before the reservation begins, or leave your job to wait until the reservation completes.

SLURM's full list of reasons/explanations can be found here.