SLURM/ClusterStatus: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
Line 55: | Line 55: | ||
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 | CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 | ||
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s | ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s | ||
</pre> | |||
==sacctmgr== | |||
The sacctmgr command show cluster accounting information. One of the helpful commands is to list the available QOS which translates into queues in systems like PBS/Torque. | |||
<pre> | |||
$ sacctmgr list qos format=Name,Priority,MaxWall,MaxJobsPU | |||
Name Priority MaxWall MaxJobsPU | |||
---------- ---------- ----------- --------- | |||
normal 0 | |||
dpart 0 2-00:00:00 8 | |||
gpu 0 08:00:00 2 | |||
</pre> | </pre> |
Revision as of 00:22, 20 January 2017
Cluster Status
SLURM offers a variety of tools to check the general status of nodes/partitions in a cluster.
sinfo
The sinfo command will show you the status of partitions in the cluster. Passing the -N flag will show each node individually.
tgray26@opensub00:sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST dpart* up infinite 8 idle openlab[00-07] gpu up infinite 2 idle openlab[08-09]
tgray26@opensub00:sinfo -N NODELIST NODES PARTITION STATE openlab00 1 dpart* idle openlab01 1 dpart* idle openlab02 1 dpart* idle openlab03 1 dpart* idle openlab04 1 dpart* idle openlab05 1 dpart* idle openlab06 1 dpart* idle openlab07 1 dpart* idle openlab08 1 gpu idle openlab09 1 gpu idle
scontrol
The scontrol command can be used to view the status/configuration of the nodes in the cluster. If passed specific node name(s) only information about those node(s) will be displayed, otherwise all nodes will be listed. To specify multiple nodes, separate each node name by a comma (no spaces).
tgray26@opensub00:scontrol show nodes openlab00,openlab01 NodeName=openlab00 Arch=x86_64 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.02 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=openlab00 NodeHostName=openlab00 Version=16.05 OS=Linux RealMemory=7822 AllocMem=0 FreeMem=5842 Sockets=2 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=49975 Weight=1 Owner=N/A MCS_label=N/A BootTime=2016-07-11T16:40:45 SlurmdStartTime=2016-07-11T23:47:24 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s NodeName=openlab01 Arch=x86_64 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.01 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=openlab01 NodeHostName=openlab01 Version=16.05 OS=Linux RealMemory=7822 AllocMem=0 FreeMem=5865 Sockets=2 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=49975 Weight=1 Owner=N/A MCS_label=N/A BootTime=2016-07-11T16:40:59 SlurmdStartTime=2016-07-11T23:48:25 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
sacctmgr
The sacctmgr command show cluster accounting information. One of the helpful commands is to list the available QOS which translates into queues in systems like PBS/Torque.
$ sacctmgr list qos format=Name,Priority,MaxWall,MaxJobsPU Name Priority MaxWall MaxJobsPU ---------- ---------- ----------- --------- normal 0 dpart 0 2-00:00:00 8 gpu 0 08:00:00 2