Submitting Jobs to Clusters

Submitting Jobs to Clusters#

Note

This section contains multiple example job scripts. You can download the companion examples from GitHub to your home (~) directory and run:

cd amii-doc-examples/src/sbatch

`sbatch`#

The sbatch command is the primary method for submitting jobs to a Slurm cluster. When you provide it with a shell script, Slurm manages the entire job lifecycle: waiting for resource availability, executing your code, and capturing the output.

When you execute sbatch script.sh, the command parses the top of your file for #SBATCH directives to determine your required resources. Once the job is successfully queued, sbatch exits immediately and displays a Job ID.

Tip

Take note of your Job ID; you will need it to track progress, view logs, or cancel the job later.

By default, both standard output (stdout) and standard error (stderr) are redirected to a file named slurm-%j.out, where %j is automatically replaced by the Job ID.

Understanding Resource Allocation#

When your requested resources are granted, Slurm launches a single instance of your job script on the first node of the allocated nodes. At this stage, your script executes exclusively on that node. While it has access to the resources granted on that specific node, simply running commands directly in the script will NOT utilize the rest of the allocated nodes.

To distribute work and leverage a full multi-node allocation, you must use the srun command (covered in the next section) to launch tasks across all reserved nodes.

Important

Slurm does NOT automatically parallelize your application across multiple nodes. As the author of the job script, you are responsible for defining how and when your application runs in parallel (typically using srun).

Tip

Try it yourself!

Submit these examples to see the difference in execution:

sbatch print-hello-single.sh
sbatch print-hello-srun.sh

Can you explain what you observed in the output files? You can also check the resources accessible to your job script by running:

sbatch check-job-script-resources.sh

While sbatch supports many command-line options, defining resources via #SBATCH directives inside the script is the best practice for reproducibility. The complete sbatch manual can be found here.

`#SBATCH` Directives#

Directives are specialized instructions placed at the very beginning of a submission script to communicate with Slurm. They are always prefixed with the #SBATCH string.

While a standard # in a shell script indicates a comment, Slurm scans the script specifically for #SBATCH to configure the environment before any code is executed. Slurm stops parsing these directives as soon as it encounters the first “executable” (non-comment/non-blank) line.

The primary purpose of #SBATCH directives is to request job-level resources. When you submit a script, Slurm calculates the total resources required and reserves those specific resources on each allocated node for the duration of your job.

Important

Reservation vs. Enforcement

Although some #SBATCH directives (like --cpus-per-task) use “task-level” terminology, they do not automatically distribute reserved node resources across tasks.

Slurm uses #SBATCH purely to determine the total resource footprint to reserve on each node. The actual enforcement and distribution of resources for individual task/processes (task-level allocation) is handled by the srun command for flexibility.

Job Identity & Logistics#

These manage how you track your job and where the output files are stored.

Directive	Description	Usage Example
`--job-name` / `-J`	The name visible in `squeue`.	`#SBATCH --job-name=my_job`
`--output` / `-o`	File for standard output.	`#SBATCH --output=my_job.out`
`--error` / `-e`	File for error messages.	`#SBATCH --error=my_job_err.log`

Filenames can contain replacement symbols to help distinguish outputs:

%j: Job ID.
%N: Short hostname.
%n: Node identifier relative to the job (e.g., “0” is the first node).
%t: Task identifier (rank), always 0 for job scripts.
%u: User name.
%x: Job name.

Example: #SBATCH --output=my_job-%N-%j.out

Resource Request Directives#

Note

In Slurm documentation, a CPU usually refers to a so called “usable CPU”, which is essentially a logical thread/core/CPU. If Hyper-Threading is enabled. 1 physical core may contain more than 2 logical CPUs. On Vulcan, Hyper-Threading is disabled for performance.

Directive	Description	Usage Example
`--nodes` / `-N`	Number of separate physical nodes requested.	`#SBATCH --nodes=2`
`--ntasks-per-node`	Advised maximum number of tasks to be invoked on each node.	`#SBATCH --ntasks-per-node=4`
`--ntasks` / `-n`	Advised total number of tasks (processes) to run.	`#SBATCH --ntasks=16`
`--cpus-per-task` / `-c`	Advised number of CPUs per task.	`#SBATCH --cpus-per-task=4`
`--mem`	Total memory requested per node.	`#SBATCH --mem=32G`
`--mem-per-cpu`	Memory requested per usable CPU.	`#SBATCH --mem-per-cpu=4G`
`--gpus` / `-G`	Total GPUs requested for the entire job.	`#SBATCH --gpus=4`
`--gpus-per-task`	GPUs required per task.	`#SBATCH --gpus-per-task=2`
`--time` / `-t`	Maximum runtime limit (D-HH:MM:SS).	`#SBATCH --time=7-04:00:00`

sbatch does NOT launch tasks. Directives advise Slurm to provide sufficient resources for tasks launched by srun inside a job script on each node.

Slurm directives interact dynamically. To ensure your job is scheduled correctly and uses resources efficiently, keep the following logic in mind:

Per-node Resource Sharing: Resources requested on a per-node basis, including request memory via --mem, create a shared pool for all tasks assigned to that specific node. If the combined usage of all tasks on a node exceeds the requested --mem value, Slurm will terminate the job with an OOM error.
Inference Logic: Directives are interdependent, and Slurm will attempt to calculate missing values based on your input. For example:
- If you specify both --ntasks and --ntasks-per-node, the total task count (--ntasks) takes precedence. In this case, --ntasks-per-node acts as a maximum limit per node, and Slurm automatically calculates the required number of nodes.
- If you specify --nodes and --ntasks-per-node but omit --ntasks, Slurm infers the total tasks as: Nodes × Tasks-per-node.
Logical Consistency: Your resource math must be physically possible.
- Invalid: Requesting --nodes=2 --ntasks=16 --ntasks-per-node=2 is a contradiction. Two nodes with a maximum of two tasks each can only support four tasks total, not 16. This will result in a submission error.
- Valid: Requesting --nodes=3 --ntasks=4 --ntasks-per-node=2 is acceptable. Slurm can distribute the four tasks unevenly (e.g., a 2-1-1 placement) across the three nodes because no single node exceeds the two-task limit.
- Modified: Requesting --nodes=4 --ntasks=3 --ntasks-per-node=2 is acceptable but Slurm will reduce --nodes to 3 since Slurm can’t run 3 processes on 4 nodes.
Hardware Physical Constraints: A request must not exceed the physical capacity of the hardware you are targeting.
- On the Vulcan cluster, each node has 64 CPUs. A request for --nodes=1 --ntasks=8 --cpus-per-task=16 (which totals 128 CPUs) will fail because Slurm cannot allocate more CPUs than a single node physically possesses.
Important

Always review the cluster hardware specifications before submitting jobs. Requesting resources that are unavailable or misaligned with the cluster configuration may result in indefinite queue times and can negatively impact other users.

The hardware specifications for Amii-managed clusters are available in the Useful Resources chapter under Hardware Specifications. Hardware specifications for other DRAC clusters can be found at DRAC wiki in the Resources section.

Task placement can be further controlled using the --distribution directive, though the default settings are usually optimal for most users.

Notification Directives#

Directive	Description	Usage Example
`--mail-user`	The email address for notifications.	`#SBATCH --mail-user=user@example.com`
`--mail-type`	Trigger (BEGIN, END, FAIL, ALL).	`#SBATCH --mail-type=END,FAIL`

Advanced Control Directives#

Directive	Description	Usage Example
`--exclusive`	The job will not share nodes with other jobs.	`#SBATCH --exclusive`
`--dependency` / `-d`	Wait for another Job ID to finish/succeed.	`#SBATCH -d afterok:98765`

Common `sbatch` Pitfalls#

Typos in Directives: Slurm ignores misspelled directives without warning. For example, #SATCH (missing the ‘B’) is treated as a plain comment. Your job will then run with default settings, often leading to immediate failure or resource starvation.
Inferred Node Counts: If --nodes is not specified, Slurm infers it based on your other requests and cluster availability.

Try running the following example with different resource requests to see how the allocation changes.
```
sbatch no-nodes.sh
```
Memory Allocation Units: The --mem directive is per node.

Try running the following examples and explain what happened.
```
sbatch test-mem-200m.sh
sbatch test-mem-300m.sh
sbatch test-mem-400m-2-nodes.sh
```
GPU Distribution Ambiguity: Unlike memory, --gpus refers to the total GPUs for the job. Slurm may not distribute these evenly. For example, a 4-GPU request on 2 nodes might result in 3 GPUs on one node and 1 on the other.

We strongly recommend combining --gpus with --gpus-per-task or --gpus-per-node for precise control.

You can test those directives with:
```
sbatch test-1-gpus-per-task-2-nodes.sh
sbatch test-2-gpus-per-node-2-nodes.sh
```
Legacy GRES Syntax: You may see --gres=gpu:N in older documentation. While still functional, this requests GPUs per node. In contrast, the modern --gpus flag requests them for the entire job.

You can observe the difference with:
```
sbatch test-gres-gpu-2-nodes.sh
```