Difference between revisions of "SLURM"

From UMIACS
Jump to navigation Jump to search
Line 1: Line 1:
 
=Simple Linux Utility for Resource Management (SLURM)=
 
=Simple Linux Utility for Resource Management (SLURM)=
 
 
SLURM is an open-source workload manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
 
SLURM is an open-source workload manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
  
Line 12: Line 11:
 
==Commands==
 
==Commands==
 
Below are some of the common commands used in slurm. Further information on how to use these commands is found in the documentation linked above. To see all flags available for each command please check the command manual by using the command <code>man $COMMAND</code> on the command line.
 
Below are some of the common commands used in slurm. Further information on how to use these commands is found in the documentation linked above. To see all flags available for each command please check the command manual by using the command <code>man $COMMAND</code> on the command line.
====sbatch====
+
 
sbatch  submits  a batch script to Slurm.  The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input. The batch script may contain options preceded with "#SBATCH" before any executable commands in the script.
+
====srun====
 +
srun runs a parallel job on a cluster managed by Slurm.  If necessary, it will first create a resource allocation in which to run the parallel job.
  
 
====salloc====
 
====salloc====
salloc is used to allocate a Slurm job allocation, which is a set of resources (nodes), possibly with some set of constraints (e.g. number of processors per node).  When salloc successfully obtains the requested allocation, it then runs the command specified by the user.  Finally, when the user specified command is complete, salloc relinquishes the job allocation. If no command is specified salloc run's the user's default shell.
+
salloc is used to allocate a Slurm job allocation, which is a set of resources (nodes), possibly with some set of constraints (e.g. number of processors per node).  When salloc successfully obtains the requested allocation, it then runs the command specified by the user.  Finally, when the user specified command is complete, salloc relinquishes the job allocation. If no command is specified, salloc runs the user's default shell.
  
====srun====
+
====sbatch====
Run a parallel job on cluster managed by Slurm.  If necessary, srun will first create a resource allocation in which to run the parallel job.
+
sbatch submits a batch script to Slurm.  The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input. The batch script may contain options preceded with "#SBATCH" before any executable commands in the script.
  
 
====squeue====
 
====squeue====
Line 25: Line 25:
  
 
====scancel====
 
====scancel====
scancel is used to signal or cancel jobs, job arrays or job steps.  An arbitrary number of jobs or job steps may be signaled using job specification filters or a space separated list of specific job and/or job step IDs.
+
scancel is used to signal or cancel jobs, job arrays, or job steps.  An arbitrary number of jobs or job steps may be signaled using job specification filters or a space separated list of specific job and/or job step IDs.
  
 
====sacct====
 
====sacct====
The  sacct command displays job accounting data stored in the job accounting log file or Slurm database in a variety of forms for your analysis.  The sacct command displays information on jobs, job steps, status, and exitcodes by default.  You can tailor the output with the use of the --format= option to specify the fields to be shown.
+
sacct displays job accounting data stored in the job accounting log file or Slurm database in a variety of forms for your analysis.  The sacct command displays information on jobs, job steps, status, and exitcodes by default.  You can tailor the output with the use of the --format= option to specify the fields to be shown.
  
 
====sstat====
 
====sstat====
The  sstat command  displays job status information for your analysis.  The sstat command displays information pertaining to CPU, Task, Node, Resident Set Size (RSS) and Virtual Memory (VM).  You can tailor the output with the use of the --fields= option to specify the fields to be shown.
+
sstat displays job status information for your analysis.  The sstat command displays information pertaining to CPU, Task, Node, Resident Set Size (RSS) and Virtual Memory (VM).  You can tailor the output with the use of the --fields= option to specify the fields to be shown.
  
 
==Modules==
 
==Modules==
If you are trying to use [[Modules | GNU Modules]] in a slurm job, please read the section of our [[Modules]] documentation on [[Modules#Modules_in_Non-Interactive_Shell_Sessions | non-interactive shell sessions]]
+
If you are trying to use [[Modules | GNU Modules]] in a Slurm job, please read the section of our [[Modules]] documentation on [[Modules#Modules_in_Non-Interactive_Shell_Sessions | non-interactive shell sessions]].

Revision as of 18:10, 12 July 2016

Simple Linux Utility for Resource Management (SLURM)

SLURM is an open-source workload manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

Documentation

Submitting Jobs
Checking Job Status
Checking Cluster Status
Official Documentation
FAQ

Commands

Below are some of the common commands used in slurm. Further information on how to use these commands is found in the documentation linked above. To see all flags available for each command please check the command manual by using the command man $COMMAND on the command line.

srun

srun runs a parallel job on a cluster managed by Slurm. If necessary, it will first create a resource allocation in which to run the parallel job.

salloc

salloc is used to allocate a Slurm job allocation, which is a set of resources (nodes), possibly with some set of constraints (e.g. number of processors per node). When salloc successfully obtains the requested allocation, it then runs the command specified by the user. Finally, when the user specified command is complete, salloc relinquishes the job allocation. If no command is specified, salloc runs the user's default shell.

sbatch

sbatch submits a batch script to Slurm. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input. The batch script may contain options preceded with "#SBATCH" before any executable commands in the script.

squeue

squeue is used to view job and job step information for jobs managed by Slurm.

scancel

scancel is used to signal or cancel jobs, job arrays, or job steps. An arbitrary number of jobs or job steps may be signaled using job specification filters or a space separated list of specific job and/or job step IDs.

sacct

sacct displays job accounting data stored in the job accounting log file or Slurm database in a variety of forms for your analysis. The sacct command displays information on jobs, job steps, status, and exitcodes by default. You can tailor the output with the use of the --format= option to specify the fields to be shown.

sstat

sstat displays job status information for your analysis. The sstat command displays information pertaining to CPU, Task, Node, Resident Set Size (RSS) and Virtual Memory (VM). You can tailor the output with the use of the --fields= option to specify the fields to be shown.

Modules

If you are trying to use GNU Modules in a Slurm job, please read the section of our Modules documentation on non-interactive shell sessions.