The RTDC
Processing SMA Data
1.2 m Telescopes
AST/RO
Extra

Converting SMA Data to a CASA Compatible Format


MethodDetailsPublic AvailabilityStatus
pyuvdataPython interface to convert between interferomteric data formats Conda or GitHub Finalizing SMA compatability
mir2msMIR/IDL command to convert raw SMA data to CASA MS MIR GitHubWorking
autofitsMIR/IDL command to convert calibrated SMA data to UVFITS MIR GitHubWorking


pyuvdata

🟊 Visit the pyuvdata SMA issues page to see current known issues and report any new ones related to SMA data.

Pyuvdata is a python interface to interferometric datasets. It allows the conversion of datasets from one format to another with multiple data formats supported. Here we give the example of converting SMA data from raw MIRIAD format to UVFITS and CASA measurement sets.

  • Retrieval
    The package can be downloaded using pip or conda. If you are working on the RTDC this is already installed.

    $ conda install -c conda-forge pyuvdata

  • Status [Apr 2023]
    In the final stages of testing. It now supports dual receivers. Note that polarizatin data is not suported. It can be used as total intenisty data but the cross-term information from SMA data is not supported by CASA.

    It does have memory limitations. The currentlu installed version has a memeory limit of 22GB. This is being worked on, but in future the memory requirement is projeted to be 3x the data size. I.e. to process a 120GB file rechunked by a factor of 8 needs ~45GB.

  • Instructions

    1. Follow the template below to create an executable script. The read_mir command allows you to extract a subset of the original data - you can find examples below.

      #! /opt/anaconda3/envs/pyuvdata/bin/python

      import os
      from pyuvdata import UVData

      # Get path to current working directory
      cwd = os.getcwd()

      UV = UVData()

      ####### SELECT WHICH DATA YOU WANT TO EXRTRACT #######

      # Read in all sources, receivers, sidebands and chunks.
      UV.read_mir("/sma/data/flux/mir_data/210808_13:37:36")

      # Read in only the USB 400GHz receiver data. All sources. All chunks.
      # The irec code chooses the receiver where 0=230, 1=345, 2=400 and 3=240.

      UV.read_mir("/sma/data/flux/mir_data/210808_13:37:36", irec=2, isb=[1])

      # Read in the fourth source 230GHz LSB chunk 3 data.
      # On the RTDC you can quickly list the sources using whatishere

      UV.read_mir("/sma/data/flux/mir_data/210808_13:37:36", isource=[3], irec=0, isb=[0], corrchunk=[2])

      ####### SELECT THE OUTPUT FORMAT #######

      # Write out to measurement set
      UV.write_ms(cwd+"/210808_133736.ms")

      # Write out to uvfits
      UV.write_uvfits(cwd+"/210808_133736.uvfits", spoof_nonessential=True)

      The construction of the script above means the output file/directory will be written to your current working directory. Omit the cwd elements if you want to define an explcit path for the output.

      The verbose versions of read_mir, write_ms, and write_uvfits are given here. Clicking on the headings will direct you to the pyuvdata docmentation describing the options in full.

      read_mir

      read_mir(filepath, isource=None, irec=None, isb=None, corrchunk=None, pseudo_cont=False, run_check=True, check_extra=True, run_check_acceptability=True, strict_uvw_antpos_check=False, allow_flex_pol=True, check_autos=True, fix_autos=True, rechunk=None)

      (Note that the None option actually means select all.)

      write_ms

      write_ms(filename, force_phase=False, clobber=False, run_check=True, check_extra=True, run_check_acceptability=True, strict_uvw_antpos_check=False, check_autos=True, fix_autos=False

      write_uvfits

      write_uvfits(filename, spoof_nonessential=False, write_lst=True, force_phase=False, run_check=True, check_extra=True, run_check_acceptability=True, strict_uvw_antpos_check=False, check_autos=True, fix_autos=False)

    2. Read the output into CASA

      UVFITS

      CASA: importuvfits(fitsfile='210808_133736.uvfits', vis='210808_133736.vis')
      CASA: ms.open(vis)

      Measurement set

      CASA: vis='210808_133736.ms'
      CASA: ms.open(vis)


mir2ms

mir2ms is a MIR task that converts a raw (uncalibrated) SMA dataset directly to CASA measurment set in IDL. We encourage users to report any issues to Charlie Qi (cqi@cfa.harvard.edu).

  • Retrieval
    The script comes packaged with the June 2021 version of MIR. This is available on the RTDC or you can find it on the MIR github page sma-mir.

  • Status [Mar 2023]
    - Header informaion on the gunnLO is currently incompatible with mir2ms; there is a work-around described below.
    - This works for both CASA 5 and 6. You have to option to specify a particular version of CASA when you call the command; an example of this is shown below.
    - By default, mir2ms will produce 2 output files, one for each receiver. Users can request only a single receiver, and/or a single sideband. This may be advantageous if memory is an issue.

  • Instructions
    1. Create a .pro script in your current working directory; in this example it has been named mymir.pro (remember that the filename must match the name given to the 'pro' in the first line of the file). There are two required tasks in this file - applying the Tsys correction and fixing the problem data header. You can optionally provide instructions to perform some basic data cleaning (e.g. flagging pointing scans & spikes). The mymir.pro script should follow this template.

      As part of the routine, mir2ms utilizes the autofits MIR command (see 'Using MIR autofits' below) to generate UVFITS files per source, sideband, and chunk. Unlike autofits, mir2ms opens your default, or provided, version of CASA then imports and concetenates the UVFITS files automatically.

      WARNING  This routine creates over 200 temporary files and directories in your cwd which will require ~ 5x the disk space of the input data directory.

    2. Run mir2ms in MIR.

      IDL> mir2ms, casa='/opt/casa-release-5.8.0-109.el6/bin/casa', dir='210808_13:37:36', rx=230, /mymir, outname='210808_133736'

      If a receiver selection not provided it will produce two output measurement sets, one for each receiver.

      Depending on the size of the data this may take multiple hours (e.g. ~1hour/10GB). When the script has finished, exit IDL and you will find the temporary files have been deleted. You will be left with an outname_rx.ms directory in your cwd along with IDL and CASA log files.


autofits

This option converts data already calibrated in MIR to CASA measurement set format. The data are written out in UVFITS format from MIR, then the provided script must used to import it to CASA.

  • Retrieval
    You can find the CASA import script at MIRFITStoCASA.py This script is used in place of CASA's importuvfits procedure which does not propagate the weights correctly from MIR.

  • Status [July 2022]
    A bug in fits_out (found inside the MIR autofits routine) has been fixed. This caused each chunk and sideband to have slightly different uv coordinates (meters) in casa. This problem meant users saw a narrower u-v coverage per baseline, and got u-v coordinates wrong by a few percent.

  • Instructions
    1. Create the UVFITS file in MIR by using the autofits routine. This will loop over all chunks, sidebands and receivers on a source-by-source basis. In this example our source is named orion.

      IDL> select,/p,/re
      IDL> autofits, source='orion'

      after providing your source name, autofits creates a separate file for each SWARM chunk (s1-s4), sideband, and receiver. The files are named with the following convention SOURCE_SB_CHUNK_RX.UVFITS   (e.g. ORION_L_S1_RX345.UVFITS, ORION_U_S4_RX230.UVFITS).

      For a typical SWARM data set you will get 16 spectral files, along with 4 extra files for the pseudo-continuum chunks, C1, which can be ignored.

    2. Next switch to CASA and use the provided script MIRFITStoCASA.py to import your data. This script loops over all the .UVFITS files and converts each of them to the measurement set (.ms) format.

      CASA: import sys 
      CASA: sys.path.append('/path/to/MIRFITStoCASA.py/') 
      CASA: import MIRFITStoCASA 
      CASA: fullvis='allorion.ms'
      CASA: allNames = []
      CASA: for sou in ['ORION']:
      CASA:    for rx in ['345','240']:
      CASA:       for sb in ['L','U']:
      CASA:          for i in ['1','2','3','4']:
      CASA:             name = sou+"_"+sb+"_S"+i+"_RX"+rx
      CASA:             print("------converting "+name+" ....")
      CASA:             MIRFITStoCASA.MIRFITStoCASA(UVFITSname=name+'.UVFITS', MSname=name+'.ms')  
      CASA:             allNames.append(name+'.ms')
      

    3. Next concatenate all the newly created measurement sets into a single file. The input here is allNames which is the python list of .ms files created in step 2. The name of the concatenated output file (fullvis) was also defined in step 2.

      CASA: concat(vis=allNames,concatvis=fullvis,timesort=True)

    4. Check the content of the final, concatenated measurement set:

      CASA: listobs(fullvis)

    5. Flag the noisy edge channels.

      CASA: flagdata(vis=fullvis, mode='manualflag', spw='*:0~nflagedge;(ntotal-nflagedge)~(ntotal-1)')

      where nflagedge should be substituted with the integer number of channels you want to flag from the edges, and ntotal should be substituted with the integer number of channels per chunk (or spw in CASA). ntotal can be found from the previous listobs command.

      The choice of how many edge channels to flag can be made by looking at the amplitude behavior as a function of frequency in each chunk, e.g. by using CASA's plotms function. Note that this can be slow if a lot of channels are present and/or a lot of tracks have previously been combined:

      CASA: plotms(vis=fullvis, xaxis='channel', yaxis='amp', avgtime='1e20', avgscan=True, iteraxis='spw')

      This will show increased noise in the edge channels. We advise a conservative trim of about 8% (so that nflagedge~0.08ntotal).

    6. Repeat the loop to create a new measurement set for another source. Alternatively, add a second source name to the for sou loop in step 2.



CENTER FOR ASTROPHYSICS | HARVARD & SMITHSONIAN
60 GARDEN STREET, CAMBRIDGE, MA 02138