Getting Help¶

AICR support is provided through the research computing teams at each participating institution. Contact your institution's team for help with accounts, jobs, software, and general AICR questions.

Institutional Support Teams¶

Institution	Support Team	Contact
Boston University	Research Computing Services (RCS)	help@rcs.bu.edu
Harvard	University Research Computing and Data (RCD) Services	urcds-aicr@harvard.edu
MIT	Office of Research Computing and Data (ORCD)	orcd-help-aicr@mit.edu
Northeastern	Research Computing (RC)	rchelp@northeastern.edu
UMass	Unity Research Computing Platform	hpc@umass.edu
URI	Unity Research Computing Platform	hpc@umass.edu
Yale	Yale Center for Research Computing (YCRC)	research.computing@yale.edu

Tip

Your institutional team is the best first point of contact. They know the AICR system and can also help with institution-specific policies, accounts, and allocations.

How to Write a Good Help Request¶

A clear, detailed help request gets you a faster answer. Include the following information when contacting support:

Essential Information¶

Your username and institution. Support staff need this to look up your account and jobs.
What you were trying to do. A brief description of your goal (e.g., "run a multi-GPU PyTorch training job on the B200 partition").
What happened. Describe the error or unexpected behavior. Include the exact error message, not a paraphrase.
Job ID. If the problem involves a Slurm job, include the job ID. Support staff can look up everything else from there:
```
$ sacct -j JOBID --format=JobID,JobName,Partition,State,ExitCode,Elapsed
```
Steps to reproduce. List the commands you ran, in order. If possible, provide a minimal example that reproduces the problem.

Helpful Extras¶

Job script. Attach or paste the full Slurm job script (#SBATCH directives and all commands).
Output and error files. Attach the .out and .err files from the job.
Software environment. Include the output of module list and conda list or pip freeze.
What you already tried. Mention any troubleshooting steps you took so support does not duplicate effort.

Example Help Request¶

Subject: Job JOBID fails with CUDA out of memory on rtx-batch

Hi, I am < USERNAME > from < INSTITUTION >. I am trying to fine-tune a language model on the rtx-batch partition, but my job fails after about 30 minutes with:

RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB

Job ID: 456789 Partition: rtx-batch GPUs requested: 1 Memory requested: 32G

I have tried reducing the batch size from 32 to 16 but the error persists. My job script and output file are attached.

Before Contacting Support¶

Try these self-service steps first:

Check this documentation. Many common issues are covered in the docs.
Check job output files. Read your .out and .err files for error messages.
Check your quota. Jobs fail when storage is full

Check job status. See why a job is pending or failed:

$ squeue -u $USER
$ sacct -j JOBID --format=JobID,State,ExitCode,Reason

Search for error messages. Copy the exact error text and search online. Many CUDA, PyTorch, and Slurm errors have well-documented solutions.