CF: Services: High Performance Computing

[an error occurred while processing this directive]


	CF: Services: High Performance Computing

	Table of Content Introduction Access Support Offfice Hours Tutorials/Past Presentations HPC Wiki Configuration Hardware Software Current cluster status is available here (@cf.harvard.edu), or here (@si.edu). Introduction SAO has access to a Linux compute cluster known as `Hydra`. This cluster started at SAO a while back, and was then managed by the CF. It has since been moved to the Smithsonian's Data Center, located in Herndon, VA (near Washington, D.C.) and is managed by SI's Office of Research Computing (ORC), part of SI's Office of the Chief Information Office (OCIO), and has become an SI-wide resource. Access Accounts on the cluster (passwords and home directories) are separate and distinct from CF or HEA unix accounts. To use `Hydra` you need to request an account as follows: (CfA/SAO users only) To request an account, use the account request page (do not email CF or HEA support, but you must be a trusted machine or use CfA's VPN). All cluster users, regardless of their status, must read and sign SD 931 and be entered in the SDF. Passwords on `Hydra` expire and must be changed every 180 days (per SD 931). To login on `Hydra`, use ssh from a trusted machine. A trusted machine is any CF or HEA managed computer, any machine connected to CfA via VPN. Use either of the two login nodes, i.e., % ssh hydra-login01.si.edu or % ssh hydra-login02.si.edu both machines are identical. Support The cluster is managed by DJ Ding (`DingDJ@si.edu`), the system administrator in Herndon, VA. Do not contact the CF or HEA support sys-admins for issues regarding Hydra. Support for SAO, is provided by SAO's HPC analyst. This role is currently assumed by Sylvain Korzennik. Sylvain is not the sys-admin, so contact DJ Ding at `SI-HPC-Admin@si.edu` for critical problems; contact Sylvain for application support, advice, help & suggestions. A mailing list is available for the cluster users. Messages posted to this list are read by the cluster sysadmin and the HPC analyst and all the other cluster users. Use this mailing list to communicate with other cluster users, share ideas, ask for help with cluster use problems, offer advice and solutions to problems, etc. To post/send messages to the list, or to read past messages, you must log to the listserv. You will need to choose a password the first time you use it (upper right, under "Options"). Because of the way this listserv is setup and managed at SI, emailing directly from your machine to `HPCC-L@si-listserv.si.edu` is likely to fail (with no warning or error message), so use the web portal. Office Hours Rather than dropping in during preset office hours, feel free to book a time slot for a Zoom-based one-on-one session using this page. You will receive confirmation and a Zoom link via email. By default, these will take place on Thursdays between 2 and 5, on a first come first serve basis. The target audience is primarily SAO users, since the bioinformatics support group already offers Bioinformatics Brown Bags on Wednesdays at noon EST. This should offer a convenient way to get help in running your tasks on Hydra. This being said, non-SAO users are welcome to book a session if the problem(s) they want to resolve is/are related to job scheduling, scripting, etc, and not specific to bioinformatics applications. Tutorials & Past Presentations The plan is to create tutorials on various aspects of scientific computing and HPC. This a place holder for future links to these tutorials. You can view slides of past presentations: Introduction to Hydra (July 10, 2019) Everything You Always Wanted to Know About the Grid Engine (August 18, 2019) HPC Wiki Information on how to use the system is at SI's HPC Wiki. Quick Start Reference Pages Upgrade Notes Contact the SAO's HPC analyst if you think that other topics and/or questions should be covered. Configuration Hardware The cluster consists of some 100 compute nodes, for a total of around 5,000 compute cores (CPUs). All the nodes are interconnected via a 10 GbE Ethernet switch, and on the 100 Gbps capable InfiniBand (IB) fabric, although older nodes have a 40 Gbps IB interface. There is some 3PB of disk space, broken down in a 3 tier architecture (NetApp/NFS, GPFS, and FreeNAS), with some being public while some is project specific. Software The cluster is a Linux-based distributed cluster, we use Bright Cluster Manager to manage the OS and the Univa Grid Engine as job scheduler. Access to the cluster is done via two login nodes, while the cluster queuing is run on a separate front-end node. To run on the cluster, you log on one of the login nodes, and you submit jobs using the queuing system (in batch mode, with the `qsub` command). You do not start interactively jobs on the compute nodes - you use instead a job script or request access to one of the interactive nodes. The login nodes are for normal interactive use like editing, compiling, script writing, etc. Computation are not run on these nodes either. The following software is available on the cluster: Compilers GCC NVIDIA Intel Libraries MPI for all compilers, including IB support the libraries that come with the compilers GSL, BLAS, LAPACK, etc Packages IDL, including 128 run-time licenses, GDL MATLAB (runtime only) Python, R, Java, etc. If you need some specific software or believe that it would benefit the user community to have some additional sofware available on the cluster, contact the SAO's HPC analyst. SAO HPC analyst - Sylvain Korzennik (hpc@cfa.harvard.edu) Last modified Wednesday, 27-Sep-2023 13:22:05 EDT

Section Photo