Disk space on Sauron

Sauron has a substantial amount of disk space available for data output.

sdutta@sauron:~/bin> df -h /raid*
Filesystem            Size  Used Avail Use% Mounted on
/dev/md0              1.7T  935G  804G  54% /raid1
/dev/sdd1             1.7T  448G  1.3T  26% /raid2
/dev/sde1             1.7T  109G  1.6T   7% /raid3
/dev/sdg1             1.7T  353G  1.4T  21% /raid4
/dev/sdh1             1.7T  223G  1.5T  13% /raid5
sdutta@sauron:~/bin>

All these filesystems are available on all the nodes. So a program can write out to these locations. However, you should note that these filesystems are mounted on the nodes using NFS. Although the connectivity is Gigabit, the bandwidth to these filesystems are shared across all the nodes. If you application is I/O heavy, you should consider outputting your data to the /scratch directories of the nodes and copying the files at the end of the run. You can do this by adding lines to your qsub script.

Example 7. Running commands on the /scratch directory

#!/bin/bash

#PBS -N Test_queue
#PBS -m abe
#PBS -l nodes=4:ppn=2
#PBS -M user@cfa.harvard.edu

mkdir -p /scratch/user/Test_queue
cp /home/user/Test_queue/test /scratch/user/Test_queue
cd /scratch/user/Test_queue
cp $PBS_NODEFILE machines
mpirun -np 8 ./test
cp -r /scratch/user/Test_queue /home/user/Test_queue_output
rm -rf /scratch/user/Test_queue

Please note that space on the /scratch directories are limited (about 25 GB). You should always free the space you used on the scratch space of the node for the next user by including the rm -rf /scratch/user/Test_queue.

If you are concerned of losing your data, for example if for some reason the cp command at the end of your job fails, you can protect yourself by rewriting the last two lines in the script in the following way:

cp -r /scratch/user/Test_queue /home/user/Test_queue_output && \
rm -rf /scratch/user/Test_queue

This prevents the rm from running if the copy fails. However, you should still clear the space on the scratch space. The line cp $PBS_NODEFILE machines will help you identify the nodes your job is running on. You can find the list of nodes in the file machines.

Example 8. Example listing of nodes for a job

node25
node25
node24
node24
node23
node23
node22
node22
node21
node21
node20
node20
node19
node19
node18
node18