SLURM: Difference between revisions

From UMIACS
Jump to navigation Jump to search
m (Tgray26 moved page Slurm to SLURM without leaving a redirect)
(8 intermediate revisions by 3 users not shown)
Line 1: Line 1:
'''Simple Linux Utility for Resource Management'''
=Simple Linux Utility for Resource Management (SLURM)=
SLURM is an open-source workload manager designed for Linux clusters of all sizes. It provides three key functions. First, it allocates exclusive or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.


UMIACS is transitioning away from our Torque/Maui batch resource manager to Slurm. Slurm is now in use broadly with the regional and national super computing communities.
==Documentation==
:[[SLURM/JobSubmission | Submitting Jobs]]
:[[SLURM/JobStatus | Checking Job Status]]
:[[SLURM/ClusterStatus | Checking Cluster Status]]
:[http://slurm.schedmd.com/documentation.html Official Documentation]
:[http://slurm.schedmd.com/faq.html FAQ]


Terminology and command line changes are the biggest differences when coming from Torque/Maui to Slurm.
==Commands==
Below are some of the common commands used in Slurm. Further information on how to use these commands is found in the documentation linked above. To see all flags available for a command, please check the command's manual by using <code>man $COMMAND</code> on the command line.


* Torque queues are now called partitions in Slurm
====srun====
srun runs a parallel job on a cluster managed by Slurm.  If necessary, it will first create a resource allocation in which to run the parallel job.


=Commands=
====salloc====
salloc allocates a Slurm job allocation, which is a set of resources (nodes), possibly with some set of constraints (e.g. number of processors per node).  When salloc successfully obtains the requested allocation, it then runs the command specified by the user.  Finally, when the user specified command is complete, salloc relinquishes the job allocation.  If no command is specified, salloc runs the user's default shell.


==sinfo==
====sbatch====
sbatch submits a batch script to Slurm.  The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input.  The batch script may contain options preceded with "#SBATCH" before any executable commands in the script.


To view partitions and nodes you can use the '''sinfo''' command.  You will notice that there are two partitions in the following example, but in this view it will break the partitions into the availability of the nodes.  The '''*''' character in the PARTITION column signifies the default partition for jobs.
====squeue====
squeue views job and job step information for jobs managed by Slurm.


<pre>
====scancel====
# sinfo
scancel signals or cancels jobs, job arrays, or job steps. An arbitrary number of jobs or job steps may be signaled using job specification filters or a space separated list of specific job and/or job step IDs.
PARTITION AVAIL  TIMELIMIT  NODES STATE NODELIST
test*        up  infinite      5  idle shado[00-04]
test2        up  infinite      3  idle shado[00-02]
</pre>


==squeue==
====sacct====
sacct displays job accounting data stored in the job accounting log file or Slurm database in a variety of forms for your analysis.  The sacct command displays information on jobs, job steps, status, and exitcodes by default.  You can tailor the output with the use of the --format= option to specify the fields to be shown.


The '''squeue''' command shows submitted jobs in partitions.  This will, by default, show all jobs in all partitionsThere are a number of limitation and output options that are documented in the man page for squeue.
====sstat====
sstat displays job status information for your analysis.  The sstat command displays information pertaining to CPU, Task, Node, Resident Set Size (RSS) and Virtual Memory (VM)You can tailor the output with the use of the --fields= option to specify the fields to be shown.


<pre>
==Modules==
# squeue
If you are trying to use [[Modules | GNU Modules]] in a Slurm job, please read the section of our [[Modules]] documentation on [[Modules#Modules_in_Non-Interactive_Shell_Sessions | non-interactive shell sessions]].
JOBID PARTITION  NAME  USER ST  TIME NODES NODELIST(REASON)
65646    batch  chem  mike  R 24:19    2 adev[7-8]
65647    batch  bio  joan  R  0:09    1 adev14
65648    batch  math  phil PD  0:00    6 (Resources)
</pre>
 
==srun==
 
To run a simple command like hostname over 4 nodes: '''srun -n4 -l hostname'''


To get an interactive session with 4GB of RAM for 8 hours with a bash shell:  '''srun --pty --mem 4096 -t 8:00:00 bash'''
=Quick Guide to translate PBS/Torque to SLURM=


==scancel==
{| class="wikitable"
|+User commands
|-
!
!PBS/Torque
!SLURM
|-
!Job submission
|qsub [filename]
|sbatch [filename]
|-
!Job deletion
|qdel [job_id]
|scancel [job_id]
|-
!Job status (by job)
|qstat [job_id]
|squeue --job [job_id]
|-
!Full job status (by job)
|qstat -f [job_id]
|scontrol show job [job_id]
|-
!Job status (by user)
|qstat -u [username]
|squeue --user=[username]
|}


To cancel a job, you can call '''scancel''' with a job number.
{| class="wikitable"
|+Environment variables
|-
!
!PBS/Torque 
!SLURM
|-
!Job ID
|$PBS_JOBID
|$SLURM_JOBID
|-
!Submit Directory
|$PBS_O_WORKDIR
|$SLURM_SUBMIT_DIR
|-
!Node List
|$PBS_NODEFILE
|$SLURM_JOB_NODELIST
|}


==scontrol==
{| class="wikitable"
 
|+Job specification
You can receive more thorough information on both nodes and partitions through the '''scontrol''' command.
|-
 
!
To show more about partitions you can run '''scontrol show partition'''
!PBS/Torque 
<pre>
!SLURM
# scontrol show partition
|-
PartitionName=test
!Script directive
  AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
|#PBS
  AllocNodes=ALL Default=YES
|#SBATCH
  DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
|-
  MaxNodes=1 MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
!Job Name
  Nodes=shado[00-04]
| -N [name]
  Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
| --job-name=[name] OR -J [name]
  State=UP TotalCPUs=10 TotalNodes=5 SelectTypeParameters=N/A
|-
  DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
!Node Count
 
| -l nodes=[count]
PartitionName=test2
| --nodes=[min[-max]] OR -N [min[-max]]
  AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
|-
  AllocNodes=ALL Default=NO
!CPU Count
  DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
| -l ppn=[count]
  MaxNodes=2 MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
| --ntasks-per-node=[count]
  Nodes=shado[00-02]
|-
  Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
!CPUs Per Task
  State=UP TotalCPUs=6 TotalNodes=3 SelectTypeParameters=N/A
|
  DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
| --cpus-per-task=[count]
</pre>
|-
 
!Memory Size
To show more about nodes you can run '''scontrol show nodes'''
| -l mem=[MB]
<pre>
| --mem=[MB] OR --mem-per-cpu=[MB]
# scontrol show nodes
|-
NodeName=shado00 Arch=x86_64 CoresPerSocket=1
!Wall Clock Limit
  CPUAlloc=0 CPUErr=0 CPUTot=2 CPULoad=1.01 Features=(null)
| -l walltime=[hh:mm:ss]
  Gres=(null)
| --time=[min] OR --time=[days-hh:mm:ss]
  NodeAddr=shado00 NodeHostName=shado00 Version=14.11
|-
  OS=Linux RealMemory=7823 AllocMem=0 Sockets=2 Boards=1
!Node Properties
  State=IDLE ThreadsPerCore=1 TmpDisk=49975 Weight=1
| -l nodes=4:ppn=8:[property]
  BootTime=2015-07-23T21:13:22 SlurmdStartTime=2015-07-30T11:21:49
| --constraint=[list]
  CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
|-
  ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
!Standard Output File
 
| -o [file_name]
 
| --output=[file_name] OR -o [file_name]
NodeName=shado01 Arch=x86_64 CoresPerSocket=1
|-
  CPUAlloc=0 CPUErr=0 CPUTot=2 CPULoad=0.94 Features=(null)
!Standard Error File
  Gres=(null)
| -e [file_name]
  NodeAddr=shado01 NodeHostName=shado01 Version=14.11
| --error=[file_name] OR -e [file_name]
  OS=Linux RealMemory=7823 AllocMem=0 Sockets=2 Boards=1
|-
  State=IDLE ThreadsPerCore=1 TmpDisk=49975 Weight=1
!Combine stdout/stderr
  BootTime=2015-07-23T21:13:22 SlurmdStartTime=2015-07-30T11:23:23
| -j oe (both to stdout)
  CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
|(Default if you don't specify --error)
  ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
|-
 
!Job Arrays
 
| -t [array_spec]
NodeName=shado02 Arch=x86_64 CoresPerSocket=1
| --array=[array_spec] OR -a [array_spec]
  CPUAlloc=0 CPUErr=0 CPUTot=2 CPULoad=0.95 Features=(null)
|-
  Gres=(null)
!Delay Job Start
  NodeAddr=shado02 NodeHostName=shado02 Version=14.11
| -a [time]
  OS=Linux RealMemory=7823 AllocMem=0 Sockets=2 Boards=1
| --begin=[time]
  State=IDLE ThreadsPerCore=1 TmpDisk=49975 Weight=1
|}
  BootTime=2015-07-23T21:13:23 SlurmdStartTime=2015-07-30T11:23:50
  CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
  ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
</pre>
 
==Modules==
If you are trying to use GNU [[Modules]] in a slurm job, please read the section of our [[Modules]] documentation on non-interactive shell sessions

Revision as of 18:39, 29 June 2017

Simple Linux Utility for Resource Management (SLURM)

SLURM is an open-source workload manager designed for Linux clusters of all sizes. It provides three key functions. First, it allocates exclusive or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

Documentation

Submitting Jobs
Checking Job Status
Checking Cluster Status
Official Documentation
FAQ

Commands

Below are some of the common commands used in Slurm. Further information on how to use these commands is found in the documentation linked above. To see all flags available for a command, please check the command's manual by using man $COMMAND on the command line.

srun

srun runs a parallel job on a cluster managed by Slurm. If necessary, it will first create a resource allocation in which to run the parallel job.

salloc

salloc allocates a Slurm job allocation, which is a set of resources (nodes), possibly with some set of constraints (e.g. number of processors per node). When salloc successfully obtains the requested allocation, it then runs the command specified by the user. Finally, when the user specified command is complete, salloc relinquishes the job allocation. If no command is specified, salloc runs the user's default shell.

sbatch

sbatch submits a batch script to Slurm. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input. The batch script may contain options preceded with "#SBATCH" before any executable commands in the script.

squeue

squeue views job and job step information for jobs managed by Slurm.

scancel

scancel signals or cancels jobs, job arrays, or job steps. An arbitrary number of jobs or job steps may be signaled using job specification filters or a space separated list of specific job and/or job step IDs.

sacct

sacct displays job accounting data stored in the job accounting log file or Slurm database in a variety of forms for your analysis. The sacct command displays information on jobs, job steps, status, and exitcodes by default. You can tailor the output with the use of the --format= option to specify the fields to be shown.

sstat

sstat displays job status information for your analysis. The sstat command displays information pertaining to CPU, Task, Node, Resident Set Size (RSS) and Virtual Memory (VM). You can tailor the output with the use of the --fields= option to specify the fields to be shown.

Modules

If you are trying to use GNU Modules in a Slurm job, please read the section of our Modules documentation on non-interactive shell sessions.

Quick Guide to translate PBS/Torque to SLURM

User commands
PBS/Torque SLURM
Job submission qsub [filename] sbatch [filename]
Job deletion qdel [job_id] scancel [job_id]
Job status (by job) qstat [job_id] squeue --job [job_id]
Full job status (by job) qstat -f [job_id] scontrol show job [job_id]
Job status (by user) qstat -u [username] squeue --user=[username]
Environment variables
PBS/Torque SLURM
Job ID $PBS_JOBID $SLURM_JOBID
Submit Directory $PBS_O_WORKDIR $SLURM_SUBMIT_DIR
Node List $PBS_NODEFILE $SLURM_JOB_NODELIST
Job specification
PBS/Torque SLURM
Script directive #PBS #SBATCH
Job Name -N [name] --job-name=[name] OR -J [name]
Node Count -l nodes=[count] --nodes=[min[-max]] OR -N [min[-max]]
CPU Count -l ppn=[count] --ntasks-per-node=[count]
CPUs Per Task --cpus-per-task=[count]
Memory Size -l mem=[MB] --mem=[MB] OR --mem-per-cpu=[MB]
Wall Clock Limit -l walltime=[hh:mm:ss] --time=[min] OR --time=[days-hh:mm:ss]
Node Properties -l nodes=4:ppn=8:[property] --constraint=[list]
Standard Output File -o [file_name] --output=[file_name] OR -o [file_name]
Standard Error File -e [file_name] --error=[file_name] OR -e [file_name]
Combine stdout/stderr -j oe (both to stdout) (Default if you don't specify --error)
Job Arrays -t [array_spec] --array=[array_spec] OR -a [array_spec]
Delay Job Start -a [time] --begin=[time]