Monitoring and Managing Jobs¶
Check Your Jobs¶
View your running and pending jobs:
View jobs on a specific partition:
Detailed Job Information¶
Inspect a running or pending job:
This shows the full job configuration: allocated nodes, resources, time limits, working directory, and more.
Job History and Resource Usage¶
After a job completes, use sacct to see what resources it actually used:
Check your recent job history:
Tip
Use sacct to check whether your jobs are using the resources you requested. If MaxRSS is much lower than your --mem request, reduce memory in future jobs to improve your fairshare.
Useful sacct Format Fields¶
| Field | Description |
|---|---|
JobID |
Job identifier |
JobName |
Job name |
Partition |
Partition used |
Elapsed |
Actual wall time |
MaxRSS |
Peak memory usage |
State |
Final job state (COMPLETED, FAILED, TIMEOUT, etc.) |
ExitCode |
Exit code (0 = success) |
AllocCPUS |
CPUs allocated |
AllocTRES |
All resources allocated (CPUs, memory, GPUs) |
Partition and Node Status¶
See which nodes are available, allocated, or down. Useful for checking whether your target partition has capacity.
This shows node names, state, GPUs, and memory for a specific partition.
Cancel a Job¶
Cancel all your jobs:
Cancel a specific array task:
Checking GPU Utilization in a Running Job¶
If you have a running GPU job, check GPU utilization by connecting to the node:
squeue -u $USER # find your node name
ssh NODE_NAME nvidia-smi # check GPU usage (requires active job on that node)
Note
You can only SSH to compute nodes where you have an active Slurm job.
See Also¶
- Slurm Basics — submitting jobs