The RTDC
Processing SMA Data
1.2 m Telescopes
AST/RO
Extra

Reducing the Size of your Data


SMARechunker

Warning (12 Feb 2019): A bug has been found in SMARechunker. If a range of channels is selected (option 2 below) the central velocity of the band is not properly recalculated. Note that in CASA the central velocity of the band is an important parameter for tasks involving resampling and velocity corrections. This does not affect the data where the full spectra window is kept. Updates will be posted here.

SMARechunker is SMA dedicated script that can be used to

(1) Rechunk (rebin) raw SMA data.
(2) Extract a channel range from the spectra of raw SMA data.

The SWARM correlator has a fixed bandwidth and resolution, so file sizes in excess of 100GB are now common (see these plots). Users often wish to reduce the size of their data set to reduce memory requirements during reduction. SMARechunker gives users the option to rechunk (rebin) the full bandwidth, and/or to extract a small window of the spectra by specifying a channel range.

Where do I find it?

Internal users can find it installed on all machines.

  • RTDC - simply type SMARechunker
  • CF - find it at /sma/bin/SMARechunker
  • SMA Hilo - find it at /application/bin/SMARechunker

External users find SMARechunker on github.

The git repository contains a Makefile, so you should only need to type make to get the executable version. Note that this program has only been tested on 64-bit Linux distributions and is not expected to work on a 32 bit distribution.


General Usage

The required flags are -i (input), -o (output) along with rebin information. If your track contains any ASIC data you will need to include a -A flag also.

Example 1) Rebinning a SWARM dataset by a constant factor of 8

$ SMARechunker -i /sma/data/science/mir_data/170101_01:02:03 -o 170101_rebin8 -r 8

This example rebins all chunks/spectral bands by the same factor using the -r flag.

Example 2) Selective rebinning based on chunk and channel number

This example rebins chunk 1 by a factor of 4 between channels 2500 and 2500, and a factor of 64 across the whole chunk.

$ SMARechunker -d -i /sma/data/science/mir_data/170101_01:02:03 -o 170101_bin4_64 1:2500:3500:4 1:0:16385:64

The -d option means that the rebinned sections are not extracted but are appended to the input file. As a result, the output file, 170101_bin4_64, will have a total of 6 chunks: s1-s4 = raw unbinned chunks; s5 = the channels rebinned by 4; s6 = the channels rebinned by 64.

You can also select based on time by using the -f and -l flags for first and last scan numbers.


Detailed instructions

SMARechunker offers the user many options to tailor their rebinning. You can find detailed instructions at Using SMARechunker


Alternatives

File size is critical if you plan to use MIR for calibration. MIR typically requires 2.5-3x the file size of memory. If you do not use SMARechunker you can reduce the size of the data being read into MIR by selecting sub-sections of the dataset. You can chose to read in just the data from a single receiver, a single side-band, or even a single chunk. This may be appealing for Galactic targets with narrow emission lines.

CENTER FOR ASTROPHYSICS | HARVARD & SMITHSONIAN
60 GARDEN STREET, CAMBRIDGE, MA 02138