Getting Help¶
AICR support is provided through the research computing teams at each participating institution. Contact your institution's team for help with accounts, jobs, software, and general AICR questions.
Institutional Support Teams¶
| Institution | Support Team | Contact |
|---|---|---|
| Boston University | Research Computing Services (RCS) | help@rcs.bu.edu |
| Harvard | TBD | TBD |
| MIT | Office of Research Computing and Data (ORCD) | orcd-help-aicr@mit.edu |
| Northeastern | Research Computing (RC) | rchelp@northeastern.edu |
| UMass | Unity Research Computing Platform | hpc@umass.edu |
| URI | Unity Research Computing Platform | hpc@umass.edu |
| Yale | Yale Center for Research Computing (YCRC) | research.computing@yale.edu |
Tip
Your institutional team is the best first point of contact. They know the AICR system and can also help with institution-specific policies, accounts, and allocations.
How to Write a Good Help Request¶
A clear, detailed help request gets you a faster answer. Include the following information when contacting support:
Essential Information¶
- Your username and institution. Support staff need this to look up your account and jobs.
- What you were trying to do. A brief description of your goal (e.g., "run a multi-GPU PyTorch training job on the B200 partition").
- What happened. Describe the error or unexpected behavior. Include the exact error message, not a paraphrase.
-
Job ID. If the problem involves a Slurm job, include the job ID. Support staff can look up everything else from there:
-
Steps to reproduce. List the commands you ran, in order. If possible, provide a minimal example that reproduces the problem.
Helpful Extras¶
- Job script. Attach or paste the full Slurm job script (
#SBATCHdirectives and all commands). - Output and error files. Attach the
.outand.errfiles from the job. - Software environment. Include the output of
module listandconda listorpip freeze. - What you already tried. Mention any troubleshooting steps you took so support does not duplicate effort.
Example Help Request¶
Subject: Job JOBID fails with CUDA out of memory on rtx-batch
Hi, I am < USERNAME > from < INSTITUTION >. I am trying to fine-tune a language model on the rtx-batch partition, but my job fails after about 30 minutes with:
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiBJob ID: 456789 Partition: rtx-batch GPUs requested: 1 Memory requested: 32G
I have tried reducing the batch size from 32 to 16 but the error persists. My job script and output file are attached.
Before Contacting Support¶
Try these self-service steps first:
- Check this documentation. Many common issues are covered in the docs.
- Check job output files. Read your
.outand.errfiles for error messages. - Check your quota. Jobs fail when storage is full
-
Check job status. See why a job is pending or failed:
-
Search for error messages. Copy the exact error text and search online. Many CUDA, PyTorch, and Slurm errors have well-documented solutions.