[an error occurred while processing this directive]
CF: Services: High Performance Computing
 

Table of Content

  1. Introduction
  2. Access
  3. Support
  4. HPC Wiki
  5. Configuration
  6. Presentations
  7. Current cluster status is available here (@cf.harvard.edu), or here (@si.edu).
  8. Past cluster use is available here (@cf.harvard.edu), or here (@si.edu).

Introduction

SAO has access to a Linux based Beowulf cluster known as Hydra.
This cluster started at SAO a while back, and was then managed by the CF. It has since been moved to the Smithsonian's Data Center, located in Herndon, VA (near Washington, D.C.) and is managed by SI's Office of Research Computing (ORIS), part of SI's Office of the Chief Information Office (OCIO), and has become an SI-wide resource.


Access

Accounts on the cluster (passwords and home directories) are separate and distinct from CF or HEA unix accounts.
To use Hydra you need to request an account as follows: (CfA/SAO users only)
  • To request an account, use the account request page (do not email CF or HEA support).
  • All cluster users, regardless of their status, must read and sign SD 931 and be entered in the SDF.
  • Passwords on Hydra expire and must be changed every 90 days (per SD 931).
To login on Hydra, use ssh from a trusted machine. A trusted machine is
  • any CF or HEA managed computer,
  • including:
    • login.cfa.harvard.edu (CF ppl), or
    • pogoN.cfa.harvard.edu (HEA ppl) where N = 1, 2, ..., 6.
  • or from any machine connected to CfA via VPN.
Use either of the two login nodes, i.e.,
 % ssh hydra-login01.si.edu
or
 % ssh hydra-login02.si.edu 
both machines are identical.


Support

The cluster is managed by DJ Ding (DingDJ@si.edu), the system administrator in Herndon, VA.
Do not contact the CF or HEA support sys-admins for issues regarding Hydra.

Support for SAO, is provided by SAO's HPC analyst. This role is currently assumed by Sylvain Korzennik.

Sylvain is not the sys-admin, so contact Jamal at SI-HPC-Admin@si.edu for critical problems; contact Sylvain for application support, advice, help & suggestions.

A mailing list is available for the cluster users. Messages posted to this list are read by the cluster sysadmin and the HPC analyst and all the other cluster users.

Use this mailing list to communicate with other cluster users, share ideas, ask for help with cluster use problems, offer advice and solutions to problems, etc.

To post/send messages to the list, or to read past messages, you must log to the listserv.
You will need to choose a password the first time you use it (upper right, under "Options").
Because of the way this listserv is setup and managed at SI, emailing directly from your machine to HPCC-L@si-listserv.si.edu is likely to fail (with no warning or error message), so use the web portal.


HPC Wiki

Information on how to use the system is at SI's HPC Wiki.

Contact the SAO's HPC analyst if you think that other topics and/or questions should be covered.


Configuration

Hardware

As of Sep 2017, the cluster consists of some 110 compute nodes, for a total of around 4,000 compute cores (CPUs). Most of the nodes are interconnected via a 1 GbE Ethernet switch, although the most recent nodes are on 10 GbE. All the nodes are connected to a 100 Gbps capable InfiniBand (IB) fabric, although the older nodes have a 40 Gbps IB interface. In addition, there will be soon nearly 350 TB of public disk space plus nearly 150 TB of user specific disk space.


Software

The cluster is a Linux-based distributed cluster, running the ROCKS distribution and uses the Sun Grid Engine queuing system (v6.2u4, aka SGE, OGE or GE).

Access to the cluster is done via two login nodes, while the cluster queuing is run on a separate front-end node. To run on the cluster, you log on one of the login nodes, and you submit jobs using the queuing system (in batch mode, with the qsub command). You do not start interactively jobs on the compute nodes - you use instead a job script. The login nodes are for normal interactive use like editing, compiling, script writing, etc. Computation are not run on these nodes either.

The following software is available on the cluster:

  • Compilers
    • GCC compilers
    • Portland Group (PGI) compilers, including the cluster dev kit
    • Intel compilers including the Cluster Studio
  • Libraries
    • MPI for all compilers, including IB support
    • the libraries that come with the compilers
    • GSL, BLAS, LAPACK, etc
  • Packages
    • IDL, including 128 run-time licenses, GDL
    • MATLAB (runtime only)
    • Python, R, Java, etc

If you need some specific software or believe that it would benefit the user community to have some additional sofware available on the cluster, contact the SAO's HPC analyst.



SAO HPC analyst - Sylvain Korzennik  (hpc@cfa.harvard.edu)
Last modified Wednesday, 14-Aug-2019 15:10:32 EDT
 
 

Section Photo