HPC Primer Introduction
- The cluster,
hydra, is made of
- a queue manager/scheduler (
- a slew of
- You access the cluster by loggin on one of the login nodes (
- From either login nodes you submit and monitor your job(s) via the queue manager/scheduler:
- the queue manager/scheduler is
SGE, or grid engine, simply
SGE was formerly known as the
SUN Grid Engine,
- it is now the
OGE for Open Grid Engine (the free version), or Oracle Grid Engine.
hydra, we run the open version of
front-end node (
hydra-2.si.edu) is running the queue manager/scheduler
- it should not be used as a login node.
- All the nodes (login, front--end and compute nodes) are interconnected via Ethernet (1Gbps).
- Some of the compute nodes, and the front-end node are also interconnected via InfiniBand (40Gbps).
click on the figure to enlarge it.
Hydra cluster schematic:
You can only
from a trusted
address, so if your desktop is not managed by
, you must
- either login on the CF gateway
- or authenticate your desktop/laptop via VPN (check the CF's help page on how to connect to CfA's VPN),
nodes are for normal interactive use like editing, compiling, script writing, etc.... and submitting jobs.
- They are not a compute nodes, nor is the front-end node, and thus they should not be used for actual computations, except short debugging interactive sessions or short ancillary computations.
- The compute nodes are the hosts on which you run your computations, by submitting a job to the queue system, via the
qsub command, from a login node.
Do not run jobs on the head node and do not run jobs out of band
, this mean:
- do not log on a compute node to manually start a computation, always use
- do not run scripts/programs that spawn additional tasks, unless you have requested the corresponding resources (assuming you know how to),
- if you run something in the background (you really shouldn't), use
wait so your job terminates only when all the associated processes have finished,
- If you run parallel jobs (MPI), read the relevant primer(s) and follow the instructions (you don't start MPI jobs on the cluster the way you do it on your workstation or laptop).
- you probably should optimize your executables with the appropriate compilation flags for production runs,
- you probably need to write a script to submit a job,
- you probably need to specify multiple options when
qsub 'ing your script,
- things don't always scale up, as you submit a lot of jobs, that will run concurrently, ask yourself:
- is there name space conflict? (all write to the same file, have the same job name, ...)
- what will be the resulting I/O load? (all read the same file, all write a lot of useless stuff)
- how much will I use/abuse disk space? (fill up shared public disk space, heavy I/O load compared to CPU load)
- you are not the sole user.
- Accounts on the cluster (passwords and home directories) are separate and distinct from
HEA unix accounts.
- To use
hydra you need to request a separate account.
- The cluster is managed by D.J. Ding (DingDJ@si.edu), the system administrator in Herndon, VA. Do not contact the CF or HEA support sys-admins.
- Additional support for SAO is provided by SAO's HPC analyst (hpc@cfa). This role is currently assumed by Sylvain Korzennik (at 25% FTE).
Sylvain is not the sys-admin, so contact D.J. for problems, contact Sylvain for advice & suggestions.
A mailing list is available for the cluster users:
- The mailing list (HPCC-L@si-listserv.si.edu) is read by the cluster sysadmin and the HPC analyst as well as by other cluster users.
- Use this mailing list to communicate with other cluster users, share ideas, ask for help with cluster use problems, offer advice and solutions to problems, etc.
- To email to that list you must log to the listserv and post your message.
- Replies to these messages are by default broadcast to the list.
- You will need to set up a password the first time you use it (upper right, under "Options").
- As for any
unix system, you must properly configure you account to access all the system resources.
- The configuration on the cluster is different from the CF or HEA-managed machines,
, and or
need to be adjusted accordingly.
You can look in
for examples (with
ls -la ~hpc
- GNU compilers (
- Portland Group (PGI) compilers (v12.5) and their Cluster Dev Kit (debugger, profiler, etc...)
- Intel compilers (v12.0) and their Cluster Studio (debugger, profiler, etc...)
- MPI, for GNU, PGI and Intel compilers, w/ IB support
- math libraries that come with compilers
- AMD math libraries
- S/W Packages
- IDL, including 128 run-time licenses (for batch processing)
- IRAF (v1.7)
More can be installed upon request.
To properly configure access these, refer to the respective entries in the primer
- 23 Jan 2012