Using Slurm Examples

This page documents some common operations that you might use with Slurm on the Rogues Gallery testbed.

If you have not checked out the main Slurm page with the queues and links to other training, please read that page first.

Current Slurm Status

Right now we mostly do not support the usage of the account flag -A <GTusername.

Slurm Interactive Jobs

On the Rogues Gallery testbed cluster, interactive jobs can be run with Slurm for you to test out debug your code. This is especially important for heterogeneous resources like Arm clusters.

Here is an example allocating one node (octavius1) on the A64FX cluster to the “debug” queue for 1 hour using salloc:

$ salloc -p rg-arm-debug --nodes=1 --ntasks-per-node=1 --nodelist octavius1 --time=01:00:00
salloc: Granted job allocation 382

Note that for the “account” parameter for salloc that you should use your GT user account for the Newell cluster. Also note that after running salloc, you should see output confirming job allocation.

You can verify that the resources have been allocated using the squeue:

$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               382     debug interact gburdell3  R       0:04      1 octavius1

Finally, you can access the interactive job with a bash shell using srun:

$ srun --jobid=<JOB_ID_ALLOCATED> --pty bash -i

Typically you do not need to include the “jobid” parameter for srun after using salloc but is included here for illustration.

Interactive jobs can also be run with Slurm using just one srun command:

$ srun -p debug  --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i

For more information on salloc, please go here: Slurm salloc.

For more information on srun, please go here: Slurm srun.

Slurm Batch Run Job

Batch jobs can also be run with Slurm on the Rogues Gallery testbed.

Here is a simple example that you can run on the quorra cluster that runs the command hostname and outputs it to a file. Create a text file named “batch-job-example.batch” with the following content:

#!/bin/bash

# Partition for the job:
#SBATCH -p debug

# Account to run the job:
#SBATCH --account=<NAME_OF_MY_ACCCOUNT>

# Multithreaded (SMP) job: must run on one node
#SBATCH --nodes=1

# The name of the job:
#SBATCH --job-name="batch-job-example"

# Maximum number of tasks/CPU cores used by the job:
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8

# The amount of memory in megabytes per process in the job:
#SBATCH --mem=32768

# The maximum running time of the job in days-hours:mins:sec
#SBATCH --time=0-1:0:00

#SBATCH -o batch-job-example-output-%j

# Run hostname command
hostname

Then run the example with sbatch:

$ sbatch batch-job-example.batch
Submitted batch job 383

This should generate an output file named “batch-job-example-output-383” that should be output in the same location as your batch file. For this example, the output should be the following:

$ more batch-job-example-output-383
newell1.cc.gatech.edu

For more information on sbatch, please go here: Slurm sbatch.

Slurm Batch Run Job with MPI

Batch run jobs using MPI (Message Passing Interface) can also be run with Slurm on the Rogues Gallery testbed.

Here is a simple example using Open MPI that you can run on the Newell cluster that compiles and runs code for a simple MPI “hello world” program.

First, create a file or download the C code from here: mpi-hello-world.c. The code is also included here:

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  // Initialize the MPI environment. The two arguments to MPI Init are not
  // currently used by MPI implementations, but are there in case future
  // implementations might need the arguments.
  MPI_Init(NULL, NULL);

  // Get the number of processes
  int world_size;
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);

  // Get the rank of the process
  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

  // Get the name of the processor
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, &name_len);

  // Print off a hello world message
  printf("Hello world from processor %s, rank %d out of %d processors\n",
         processor_name, world_rank, world_size);

  // Finalize the MPI environment. No more MPI calls can be made after this
  MPI_Finalize();
}

Create a text file named “mpi-batch-job-example.batch” with the following content:

#!/bin/bash

# Partition for the job:
#SBATCH -p debug

# Account to run the job:
#SBATCH --account=<NAME_OF_MY_ACCCOUNT>

# Multithreaded (SMP) job: must run on one node
#SBATCH --nodes=2
#SBATCH --nodelist=newell1,newell2

# The name of the job:
#SBATCH --job-name="mpi-batch-job-example"

# Maximum number of tasks/CPU cores used by the job:
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=8

# The amount of memory in megabytes per process in the job:
#SBATCH --mem=32768

# The maximum running time of the job in days-hours:mins:sec
#SBATCH --time=0-1:0:00

#SBATCH -o mpi-batch-job-example-output-%j

# Source .bashrc file
source ~/.bashrc

# Clear modules and load OpenMPI (4.4.1) module
module purge
module load openmpi/4.4.1

# Run the mpi-hello-world example from mpi-batch-job-examples directory
cd $HOME/mpi-batch-job-examples
mpicc mpi-hello-world.c -o mpi-hello-world
mpirun mpi-hello-world

Be sure to change the “account” parameter to your GT user account.

Note that the 2 nodes used in the example (newell1 and newell2) are specified in the batch file using the “nodelist” parameter.

Also note that Open MPI (version 4.4.1) is loaded using module in this example.

Then run the MPI example with sbatch:

$ sbatch batch-job-example.batch
Submitted batch job 384

This should generate an output file named “batch-job-example-output-384” that should be output in the same location as your batch file - here in the mpi-batch-job-examples directory. For this example, the output should be the following:

$ more mpi-batch-job-example-output-384
Hello world from processor newell1.cc.gatech.edu, rank 0 out of 2 processors
Hello world from processor newell2.cc.gatech.edu, rank 1 out of 2 processors

For more information on Open MPI, please go here: Open MPI