Difference between revisions of "SLURM/ClusterStatus"

From UMIACS
Jump to navigation Jump to search
Line 1: Line 1:
 
=Cluster Status=
 
=Cluster Status=
The general status of nodes/partitions in a cluster can be viewed using the sinfo and scontrol commands.
+
Slurm offers a variety of tools to check the general status of nodes/partitions in a cluster.
 +
 
 
==sinfo==
 
==sinfo==
sinfo will show you the status of partitions in the cluster. Passing the -N flag will show each node individually.
+
The sinfo command will show you the status of partitions in the cluster. Passing the -N flag will show each node individually.
 
<pre>
 
<pre>
 
tgray26@opensub00:sinfo
 
tgray26@opensub00:sinfo
Line 23: Line 24:
 
openlab09      1      gpu idle
 
openlab09      1      gpu idle
 
</pre>
 
</pre>
 +
 
==scontrol==
 
==scontrol==
The scontrol command, while generally reserved for administrator use, can be used to view the status/configuration of the nodes in the cluster. If passed specific node name(s) only information about those node(s) will be displayed, otherwise all nodes will be listed. To specify multiple nodes, separate each node name by a comma (no spaces).
+
The scontrol command can be used to view the status/configuration of the nodes in the cluster. If passed specific node name(s) only information about those node(s) will be displayed, otherwise all nodes will be listed. To specify multiple nodes, separate each node name by a comma (no spaces).
 
<pre>
 
<pre>
 
tgray26@opensub00:scontrol show nodes openlab00,openlab01
 
tgray26@opensub00:scontrol show nodes openlab00,openlab01

Revision as of 18:10, 12 July 2016

Cluster Status

Slurm offers a variety of tools to check the general status of nodes/partitions in a cluster.

sinfo

The sinfo command will show you the status of partitions in the cluster. Passing the -N flag will show each node individually.

tgray26@opensub00:sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
dpart*       up   infinite      8   idle openlab[00-07]
gpu          up   infinite      2   idle openlab[08-09]
tgray26@opensub00:sinfo -N
NODELIST   NODES PARTITION STATE
openlab00      1    dpart* idle
openlab01      1    dpart* idle
openlab02      1    dpart* idle
openlab03      1    dpart* idle
openlab04      1    dpart* idle
openlab05      1    dpart* idle
openlab06      1    dpart* idle
openlab07      1    dpart* idle
openlab08      1       gpu idle
openlab09      1       gpu idle

scontrol

The scontrol command can be used to view the status/configuration of the nodes in the cluster. If passed specific node name(s) only information about those node(s) will be displayed, otherwise all nodes will be listed. To specify multiple nodes, separate each node name by a comma (no spaces).

tgray26@opensub00:scontrol show nodes openlab00,openlab01
NodeName=openlab00 Arch=x86_64 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.02
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=openlab00 NodeHostName=openlab00 Version=16.05
   OS=Linux RealMemory=7822 AllocMem=0 FreeMem=5842 Sockets=2 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=49975 Weight=1 Owner=N/A MCS_label=N/A
   BootTime=2016-07-11T16:40:45 SlurmdStartTime=2016-07-11T23:47:24
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


NodeName=openlab01 Arch=x86_64 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.01
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=openlab01 NodeHostName=openlab01 Version=16.05
   OS=Linux RealMemory=7822 AllocMem=0 FreeMem=5865 Sockets=2 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=49975 Weight=1 Owner=N/A MCS_label=N/A
   BootTime=2016-07-11T16:40:59 SlurmdStartTime=2016-07-11T23:48:25
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s