Tags:
create new tag
, view all tags

How to copy files to/from hydra and what disk(s) to use

How to copy files to/from hydra

  • You can copy files to/from hydra using scp or sftp
    • to hydra from trusted hosts (SI or SAO/CfA IPs),
    • from hydra to hosts that allow external ssh/scp/sftp connections (see note below).
  • For large transfers, we ask users to use rsync, and limit the bandwidth to 1MB/s (3.5GB/h), with --bwlimit=1000.
  • Remember that rm, mv and cp can also create high I/O load on the NFS servers (both NetApps and the other servers),

So limit your concurrent I/Os and serialize your I/Os as much as possible.

Use scp or sftp for small files Use rsync --bwlimit=1000 for large transfers
Limit, when possible, concurrent I/Os Serialize them, or limit the number of high I/O load jobs
Stop right away your cp, rm, rsync, mv process(es) if the load on the head node exceeds 6 (check w/ uptime or top)

NOTE:

Access to SAO/CfA hosts is limited to the border control hosts (login.cfa.harvard.edu or pogo?.cfa.harvard.edu),
for SOA/CfA users, tunneling via these border control hosts is explained on the CF's SSH Remote Access page,
or the HEAD Systems Group's SSH FAQ page.

What disk(s) to use

There are currently a dozen distinct scratch file systems, besides you home directory (/home), most (but not all) on NetApp:

Filesystem Size Mounted on
NetApp.4:....../cluster 1.0 TB /pool/cluster
NetApp.5:...../cluster1 5.2 TB /pool/cluster1
NetApp.5:...../cluster2 4.0 TB /pool/cluster2
NetApp.5:...../cluster3 4.0 TB /pool/cluster3
NetApp.5:...../cluster4 9.0 TB /pool/cluster4
NetApp.5:...../cluster5 4.7 TB /pool/cluster5
Srvr.250:./pool-cluster6 7.2 TB /pool/cluster6
NetApp.5:...../cluster7 2.9 TB /pool/cluster7
Total: 37.9 TB Public scratch storage
    to be consolidated in fewer partitions
NetApp.5:........../sao 5.0 TB /pool/sao (all SAO users)
NetApp.5:..../sao_atmos 20.0 TB /pool/sao_atmos (ATMOS group)
NetApp.5:...../sao_rtdc 5.0 TB /pool/sao_rtdc (RTDC group)
Total: 30.0 TB SAO storage
NetApp.5:....../siomics 2.000 TB /pool/siomics (genomics)
NetApp.5:........./nasm 2.000 TB /pool/nasm (NASM)
NetApp.5:........./nmnh 1.000 TB /pool/nmnh (NMNH)
Total: 5.000 TB SI storage
Server.101:......./temp1 1.134 TB /pool/temp1
Server.250:../pool-temp2 7.163 TB /pool/temp2
Total: 8.297 TB Short term storage,
    files older than 14 days are scrubbed

  • Use the scratch space (/pool), not your home directory for large data storage, needed for your computation.
  • This scratch space is for temporary storage on a first come first served basis.
  • This is a shared resource, use responsibly.
  • None are backed up (but the ones on the NetApps have the .snapshot enabled).
  • There are no scrubbers running on the /pool/cluster* file systems. So be considerate of others, delete regularly what you don't use/need any longer.
  • /pool/cluster2 is now its own separate volume.
  • We will consolidate /pool/cluster* into fewer partitions (disks).

  • Disks get filled up by either running (i) out of disk space, or (ii) out of inodes (these are used to keep track of filenames and directories).
    • If you produce a lot of small files you will run out of inodes before filling the disk space.
      Avoid producing, or leaving, lots of small files. You should monitor both disk space (i.e., df /pool/sao) and inodes use (i.e., df -i /pool/sao),
    • If IUse% exceeds Use% you need to reduce the number of files you are using by either consolidating your small files into a single .tgz or .zip file, or reorganizing your disk space to use fewer but larger files.
    • You can use the command ~hpc/sbin/check-disks.pl +i to check disk use, the +i option adds the IUse% after the Use% value.

  • Pick a file system and create a subdirectory using your username, and store your stuff under that subdir.

The NetApp filer has been recently (02/2012) upgraded, and I/Os performances have improved.

Some users have invested in buying their own NetApp disks. Contact me (at hpc@cfa) if you want to do so.

Local Disk Space

  • There is some local disk space on each compute node, the size of the local disk varies greatly from node to node.

  • We discourage its use, unless you run jobs that have heavy I/O needs.
    Remember that you don't know on what node your job(s) will run on, nor how much local free disk space is available.

  • If you do use the local disks, purge them when you are done, including anything left over by crashed, terminated or killed jobs.
    We do no scrub these disks, nor check them for stale content.

If you have heavy I/O needs, please contact me (at hpc@cfa), so I can look at how to streamline your I/O use and if needed offset some of it on to these local disks.

FYI: High Performance File System

  • We hope to have soon (once the we have the InfiniBand fabric up and running) a high performance file system (luster or pvfs2).
  • Once such a file system will be available, there will be no local disk space left/available on the compute nodes.

-- SylvainKorzennikHPCAnalyst - 30 Jan 2012

Topic revision: r7 - 2013-07-17 - SylvainKorzennikHPCAnalyst
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2015 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback