Parallel Processing with Hydra

You will have home space for small files e.g. script/configuration files. Then a larger working area in /pool/sao/username or /scratch/sao/username - you should create and run jobs from here. Note that files in /pool are scrubbed after 180 days, while files in /scratch are scrubbed after 90 days.

You can copy files to and from Hydra in the usual way with rsync or scp.

■ Creating a job script

The job script can be generated using the QSub Generator, or written manually.
You can find some template scripts for running CASA jobs below.

High-memory job (contains detailed comments)
Low-memory script (job script only)
Serial script (job script only)
Parallel + SSD request script (job script only)

Find more information on all the queue options at HPC: Available Queues

■ Running a job

You can submit your job to the queue from the command line in your cwd. Assuming the -cwd flag is set then all output files will be written here.

$ qsub myscript.job

qsub sends the job from the login node (/pool/sao) to the compute node assigned to you. You can see a list of the compute nodes here, and can select a particular one to run your job on during submission, however there may be a long wait if all the slots on that node are in use.

$ qsub -q 'mThM.q@compute-9-*' myscript.job

Find more information on submitting jobs at HPC: Submitting Jobs

■ Monitoring progress

Your first check can be with qstat. This will confirm your job has been submitted successfully and is either running (r) or queued and waiting (qw).

$ qstat -u username

For more options you must load the tools/local module

$ module load tools/local

For different options based on qstat try these

$ q+ +a%
$ q+ +rr%

To plot memory usage (find job number from qstat/q+). This will open an x-window.

$ plot-qmemuse.pl -x jobnumber

You can see an overview of the whole cluster at Hydra Status. You can follow links to display metrics for individual users.

■ Killing a job

You can kill all your jobs by using your username.

$ qdel -u username

Alternatively you can kill specific jobs using the job ID reported by qstat..

$ qdel 123456

CENTER FOR ASTROPHYSICS | HARVARD & SMITHSONIAN
60 GARDEN STREET, CAMBRIDGE, MA 02138