Slurm

From Cbcb-private
Jump to: navigation, search

CBCB SLURM Cluster

There are two submission nodes that are available to do data management and submit jobs to the cluster.

  • cbcbsub00.umiacs.umd.edu
  • cbcbsub01.umiacs.umd.edu

Please do not run CPU intensive jobs on these nodes as they will be terminated.

There are currently a number of nodes available in the cluster and are scheduled with the SLURM resource manager. We have a a few aliases that are injected into your environment at the system level show_nodes and show_qos. If you have local shell configurations that have overriden these you may want to make sure you source the correct slurm shell initialization files for tcsh/csh /etc/profile.d/slurm.csh or bash/sh /etc/profile.d/slurm.sh.

# show_nodes
NODELIST             CPUS       MEMORY     AVAIL_FEATURES            GRES                      STATE
crow01               8          7822       Xeon,x5460                (null)                    idle
crow02               8          7822       Xeon,x5460                (null)                    idle
crow03               8          7822       Xeon,x5460                (null)                    idle
gum                  64         515797     Opteron,6380              (null)                    mix
heron00              32         128742     Opteron,6378              (null)                    mix
ibis02               12         48138      Opteron,2453he            (null)                    mix
ibis03               12         48138      Opteron,2453he            (null)                    mix
ibis04               12         48138      Opteron,2453he            (null)                    mix
ibis05               12         48138      Opteron,2453he            (null)                    mix

You will need to specify a QOS (quality of service --qos) when you want to submit a job that requires more resources than the default.

$ show_qos
                Name     MaxWall MaxJobsPU       MaxTRES     MaxTRESPU
-------------------- ----------- --------- ------------- -------------
             default    01:00:00        16        mem=4G       mem=64G
          throughput    18:00:00       125       mem=36G
     high_throughput    08:00:00       300        mem=8G
               large 11-00:00:00         5      mem=128G
              xlarge 21-00:00:00         1      mem=512G
                long  7-00:00:00        16       mem=12G
         workstation  7-00:00:00         4       mem=48G

Getting Started

Quick Guide to translate PBS/Torque to SLURM

User commands
PBS/Torque SLURM
Job submission qsub [filename] sbatch [filename]
Job deletion qdel [job_id] scancel [job_id]
Job status (by job) qstat [job_id] squeue --job [job_id]
Full job status (by job) qstat -f [job_id] scontrol show job [job_id]
Job status (by user) qstat -u [username] squeue --user=[username]
Interactive shell qsub -I srun --pty bash
Environment variables
PBS/Torque SLURM
Job ID $PBS_JOBID $SLURM_JOBID
Submit Directory $PBS_O_WORKDIR $SLURM_SUBMIT_DIR
Node List $PBS_NODEFILE $SLURM_JOB_NODELIST

For more information on available variables in SLURM see the section OUTPUT ENVIRONMENT VARIABLES in the man page for sbatch or online.

Job specification
PBS/Torque SLURM
Script directive #PBS #SBATCH
Job Name -N [name] --job-name=[name] OR -J [name]
Node Count -l nodes=[count] --nodes=[min[-max]] OR -N [min[-max]]
CPU Count -l ppn=[count] --ntasks-per-node=[count]
CPUs Per Task --cpus-per-task=[count]
Memory Size -l mem=[MB] --mem=[MB] OR --mem-per-cpu=[MB]
Wall Clock Limit -l walltime=[hh:mm:ss] --time=[min] OR --time=[days-hh:mm:ss]
Node Properties -l nodes=4:ppn=8:[property] --constraint=[list]
Standard Output File -o [file_name] --output=[file_name] OR -o [file_name]
Standard Error File -e [file_name] --error=[file_name] OR -e [file_name]
Combine stdout/stderr -j oe (both to stdout) (Default if you don't specify --error)
Job Arrays -t [array_spec] --array=[array_spec] OR -a [array_spec]
Delay Job Start -a [time] --begin=[time]