|
Table of Content
- Introduction
- Access
- Support
- HPC Wiki
- Configuration
- Current cluster status is available
here (@cf.harvard.edu), or
here (@si.edu).
Introduction
SAO has access to a Linux compute cluster known as Hydra.
This cluster started at SAO a while back, and was then managed by the CF. It
has since been moved to the Smithsonian's Data Center, located in Herndon, VA
(near Washington, D.C.) and is managed by SI's Office of Research Computing
(ORC), part of SI's Office of the
Chief Information Office (OCIO), and
has become an SI-wide resource.
Access
Accounts on the cluster (passwords and home directories) are separate and
distinct from CF or HEA unix accounts.
To use Hydra you need to request an account as follows: (CfA/SAO users only)
-
To request an account, use the account request page (do not email CF or HEA support,
but you must be a trusted machine or use CfA's VPN).
-
All cluster users, regardless of their status, must read and sign
SD 931 and be entered in the SDF.
-
Passwords on Hydra expire and must be changed every 180 days (per SD 931).
To login on Hydra, use
ssh
from a trusted machine.
A trusted machine is
- any CF or HEA managed computer,
- any machine connected to CfA via
VPN.
Use either of the two login nodes, i.e.,
% ssh hydra-login01.si.edu
or
% ssh hydra-login02.si.edu
both machines are identical.
Support
The cluster is managed by DJ Ding
(DingDJ@si.edu), the system administrator in Herndon, VA.
Do not contact the CF or HEA support sys-admins for issues regarding Hydra.
Support for SAO, is provided by SAO's
HPC analyst. This role is currently assumed by Sylvain Korzennik.
Sylvain is not the sys-admin, so contact DJ Ding at
SI-HPC-Admin@si.edu
for critical problems; contact Sylvain for application support, advice, help & suggestions.
A mailing list is available for the cluster users. Messages posted to this
list are read by the cluster sysadmin and the HPC analyst and all the other
cluster users.
Use this mailing list to communicate with other cluster users, share ideas,
ask for help with cluster use problems, offer advice and solutions to
problems, etc.
To post/send messages to the list, or to read past messages, you must log to
the listserv.
You will need to choose a password the first time you use it (upper right,
under "Options").
Because of the way this listserv is setup and managed at SI,
emailing directly from your machine to HPCC-L@si-listserv.si.edu is
likely to fail (with no warning or error message), so use the
web portal.
Office Hours
Rather than dropping in during preset office hours, feel free to book a time
slot for a Zoom-based one-on-one session using this page. You will receive confirmation
and a Zoom link via email.
By default, these will take place on Thursdays between 2 and 5, on a first
come first serve basis.
The target audience is primarily SAO users, since the bioinformatics support
group already offers Bioinformatics Brown Bags on Wednesdays at noon EST. This
should offer a convenient way to get help in running your tasks on Hydra.
This being said, non-SAO users are welcome to book a session if the problem(s)
they want to resolve is/are related to job scheduling, scripting, etc, and not
specific to bioinformatics applications.
Tutorials & Past Presentations
The plan is to create tutorials on various aspects of scientific computing and
HPC. This a place holder for future links to these tutorials.
You can view slides of past presentations:
HPC Wiki
Information on how to use the system is at SI's
HPC Wiki.
Contact the SAO's HPC analyst if you think that other topics and/or questions should be
covered.
Configuration
Hardware
The cluster consists of some 100 compute nodes, for a total of around 5,000
compute cores (CPUs). All the nodes are interconnected via a 10 GbE Ethernet
switch, and on the 100 Gbps capable InfiniBand (IB) fabric, although older
nodes have a 40 Gbps IB interface. There is some 3PB of disk space, broken
down in a 3 tier architecture (NetApp/NFS, GPFS, and FreeNAS), with some being
public while some is project specific.
Software
The cluster is a Linux-based distributed cluster, we use Bright Cluster
Manager to manage the OS and the Univa Grid Engine as job scheduler.
Access to the cluster is done via two login nodes, while the cluster queuing
is run on a separate front-end node. To run on the cluster, you log on one of
the login nodes, and you submit jobs using the queuing system (in batch mode,
with the qsub command). You do not start interactively jobs on the
compute nodes - you use instead a job script or request access to one of the
interactive nodes. The login nodes are for normal interactive use
like editing, compiling, script writing, etc. Computation are not run on these
nodes either.
The following software is available on the cluster:
- Compilers
- Libraries
- MPI for all compilers, including IB support
- the libraries that come with the compilers
- GSL, BLAS, LAPACK, etc
- Packages
- IDL, including 128 run-time licenses, GDL
- MATLAB (runtime only)
- Python, R, Java, etc.
If you need some specific software or believe that it would benefit the
user community to have some additional sofware available on the
cluster, contact the SAO's HPC analyst.
SAO HPC analyst - Sylvain Korzennik
(hpc@cfa.harvard.edu)
Last modified Wednesday, 27-Sep-2023 13:22:05 EDT
| |