SLURM: Difference between revisions
(→sinfo) |
|||
Line 15: | Line 15: | ||
<pre> | <pre> | ||
# sinfo | # sinfo | ||
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | ||
test* up infinite 5 idle shado[00-04] | |||
test2 up infinite 3 idle shado[00-02] | |||
</pre> | </pre> | ||
Revision as of 17:06, 30 July 2015
Simple Linux Utility for Resource Management
UMIACS is transitioning away from our Torque/Maui batch resource manager to Slurm. Slurm is now in use broadly with the regional and national super computing communities.
Terminology and command line changes are the biggest differences when coming from Torque/Maui to Slurm.
- Torque queues are now called partitions in Slurm
Commands
sinfo
To view partitions and nodes you can use the sinfo command. You will notice that there are two partitions in the following example, but in this view it will break the partitions into the availability of the nodes. The * character in the PARTITION column signifies the default partition for jobs.
# sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test* up infinite 5 idle shado[00-04] test2 up infinite 3 idle shado[00-02]
squeue
The squeue command shows submitted jobs in partitions. This will, by default, show all jobs in all partitions. There are a number of limitation and output options that are documented in the man page for squeue.
# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 65646 batch chem mike R 24:19 2 adev[7-8] 65647 batch bio joan R 0:09 1 adev14 65648 batch math phil PD 0:00 6 (Resources)
srun
To run a simple command like hostname over 4 nodes: srun -n4 -l hostname
To get an interactive session with 4GB of RAM for 8 hours with a bash shell: srun --pty --mem 4096 -t 8:00:00 bash
scancel
To cancel a job, you can call scancel with a job number.
scontrol
You can receive more thorough information on both nodes and partitions through the scontrol command.
To show more about partitions you can run scontrol show partition
# scontrol show partition PartitionName=test AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=YES DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO MaxNodes=1 MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=shado[00-04] Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF State=UP TotalCPUs=10 TotalNodes=5 SelectTypeParameters=N/A DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED PartitionName=test2 AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=NO DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO MaxNodes=2 MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=shado[00-02] Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF State=UP TotalCPUs=6 TotalNodes=3 SelectTypeParameters=N/A DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
To show more about nodes you can run scontrol show nodes
# scontrol show nodes NodeName=shado00 Arch=x86_64 CoresPerSocket=1 CPUAlloc=0 CPUErr=0 CPUTot=2 CPULoad=1.01 Features=(null) Gres=(null) NodeAddr=shado00 NodeHostName=shado00 Version=14.11 OS=Linux RealMemory=7823 AllocMem=0 Sockets=2 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=49975 Weight=1 BootTime=2015-07-23T21:13:22 SlurmdStartTime=2015-07-30T11:21:49 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s NodeName=shado01 Arch=x86_64 CoresPerSocket=1 CPUAlloc=0 CPUErr=0 CPUTot=2 CPULoad=0.94 Features=(null) Gres=(null) NodeAddr=shado01 NodeHostName=shado01 Version=14.11 OS=Linux RealMemory=7823 AllocMem=0 Sockets=2 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=49975 Weight=1 BootTime=2015-07-23T21:13:22 SlurmdStartTime=2015-07-30T11:23:23 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s NodeName=shado02 Arch=x86_64 CoresPerSocket=1 CPUAlloc=0 CPUErr=0 CPUTot=2 CPULoad=0.95 Features=(null) Gres=(null) NodeAddr=shado02 NodeHostName=shado02 Version=14.11 OS=Linux RealMemory=7823 AllocMem=0 Sockets=2 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=49975 Weight=1 BootTime=2015-07-23T21:13:23 SlurmdStartTime=2015-07-30T11:23:50 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s