Skip to content

Monitoring and Managing Jobs

Check Your Jobs

View your running and pending jobs:

squeue --me

View jobs on a specific partition:

squeue -p rtx-batch

Detailed Job Information

Inspect a running or pending job:

scontrol show job JOBID

This shows the full job configuration: allocated nodes, resources, time limits, working directory, and more.

Job History and Resource Usage

After a job completes, use sacct to see what resources it actually used:

sacct -j JOBID --format=JobID,JobName,Partition,Elapsed,MaxRSS,State

Check your recent job history:

sacct -u $USER --starttime=now-7days --format=JobID,JobName,Partition,Elapsed,State,ExitCode

Tip

Use sacct to check whether your jobs are using the resources you requested. If MaxRSS is much lower than your --mem request, reduce memory in future jobs to improve your fairshare.

Useful sacct Format Fields

Field Description
JobID Job identifier
JobName Job name
Partition Partition used
Elapsed Actual wall time
MaxRSS Peak memory usage
State Final job state (COMPLETED, FAILED, TIMEOUT, etc.)
ExitCode Exit code (0 = success)
AllocCPUS CPUs allocated
AllocTRES All resources allocated (CPUs, memory, GPUs)

Partition and Node Status

sinfo

See which nodes are available, allocated, or down. Useful for checking whether your target partition has capacity.

sinfo -p rtx-batch --format="%N %T %G %m"

This shows node names, state, GPUs, and memory for a specific partition.

Cancel a Job

scancel JOBID

Cancel all your jobs:

scancel -u $USER

Cancel a specific array task:

scancel JOBID_TASKID

Checking GPU Utilization in a Running Job

If you have a running GPU job, check GPU utilization by connecting to the node:

squeue -u $USER                    # find your node name
ssh NODE_NAME nvidia-smi           # check GPU usage (requires active job on that node)

Note

You can only SSH to compute nodes where you have an active Slurm job.

See Also