SMA Home

SMA CASA Website

SMA Data Reduction

  • Introduction
  • Getting Started
  • Getting Your Data Into CASA
  • Processing SMA Data in CASA
  • Getting Help
  • Contact

Welcome to the SMA CASA Website.

The purpose of this site is to explain how SMA data can be processed with CASA. The information presented here covers how to get SMA data into the CASA package and SMA-specific CASA processing steps.

SMA data can be imported directly into CASA, without any pre-processing by any other package such as MIR or Miriad. It has been tested with single and dual receiver data, and data from the new SWARM (SMA Wideband Astronomical ROACH2 Machine) correlator. It cannot yet process polarization data, or mosaic tracks.

CASA (Common Astronomy Software Applications) is fully documented at


The Tools to Make a CASA Measurement Set from SMA Data

Two python scripts have been written that allow you to import SMA data into CASA:
   This script produces a set of FITS-IDI files. One such file is produced for each sideband of each spectral chunk in the data set. This script is run at the shell level and writes the FITS-IDI files into the current working directory. If you run this script on one of the CF-managed linux machines in Cambridge, MA, the script should run without modification. To run the script elsewhere, you'll need to modify the scripts first line to point to your own local python distribution.
By default, the script calls a C language module called "makevis". The object code for that module is, and it must be present in the directory you are running in. This module does the low-level processing of the visibilities data, and writing it in C rather than Python very significantly speeds up the execution of Since is compiled code, it is not as portable as Python code. You may need to recompile makevis.c if does not work properly on your machine. The source code and a Makefile are included in the git repository. It is almost certain that the Makefile will work only on Linux machines. If you have trouble with makevis, and don't want to go through the hassle of figuring out how to compile it, you can use the -P option for If -P is specified, sma2casa will not use makevis and will run using Python only, which will be significantly (factor of ~2) slower, but more portable.
   This script is run from the CASA ipython interpreter. It reads in the FITS-IDI files that created, performs some calculations to make the SMA data more similar to ALMA data, and then concatonates everything into a single CASA Measurement Set (MS).

The SMA CASA python scripts are updated frequently (mostly bug fixes). Before processing data from a new track users should verify that they have the latest versions of the scripts.

The most recent versions of these scripts can be obtained from the github software repository by issuing the command:

ShellPrompt> git clone

git is free software for Linux and Mac machines. Free download is available at

Python is free software for Linux and Mac machines. Free download is available at

SMA CASA scripts have been fully tested for Python 2.x; however, the SMA CASA scripts have not been tested with Python 3.x.

To determine the version of Python running on your machine, type the following:

ShellPrompt>python -V

Python 2.7.3

There are several Python modules required in order to run the SMA CASA scripts:
numpy    casac
pyfits    tasks
astropy    taskinit

NOTE: These modules listed under are part of a standard CASA installation.

To verify if these modules are installed, type the following commands:

ShellPrompt> python
>>> help()
help> modules

If you are having trouble getting a self-consitent set of Python and Python modules together which will support, you could consider installing the Anaconda Python distribution. That installation has everything needed, and is tested using that distribution.

Public Computers at the CfA in Cambridge with CASA Installed

Running CASA on the R&G or CF Computers or the Hydra Cluster

There are six versions of CASA on the R&G public computers:

CASA version Executable in /usr/local                Variable name
3.0.0 casapy-30.0.9860-001-64b             casapy30  
3.1.0 casapy-31.0.13530-002-64b           casapy31  
3.3.0 casapy-33.0.16856-002-64b           casapy33  
3.4.0 casapy-34.0.19988-002-64b           casapy34  
4.0.1 casapy-40.1.22889-003-64b           casapy40.1
4.1.0 casapy-stable-41.0.22971-001-64b casapy41  

To run version 4.0.1 on RTDC7, type:

  1. ssh -X rtdc7
  2. /usr/local/casapy-40.1.22889-003-64b/casapy

Alternatively, type:

  1. ssh -X rtdc7
  2. casapy40.1

NOTE: Casapy is updated weekly via rsync.

Additionally, there are three versions of CASA on the CF managed linux computers:

CASA version Executable in /sma/ALMA               
3.4.0 casa-34/casapy            
4.1.0 casa-41/casapy            
4.2.0 casa-42/casapy            

To run version 4.1.0 on cfa0, type:

  1. ssh -X (if not on CF managed machine)
  2. ssh -Y cfa0
  3. /sma/ALMA/casa-41/casapy

In the near future, users will be able to run their time or cpu intensive jobs using parallelized CASA on SAO's HYDRA cluster. For a progress report, click here.

Public Computers at the SMA in Hawaii with CASA Installed

Running CASA on the Hawaii Computer

There is one versions of CASA in Hawaii on

NOTE: To logon to hilodr2 you must obtain the IP address. Please contact for assistance.

CASA version Executable in /usr/local                Variable name
4.2.0 casapy-stable-42.0.26465-011-64b casapy  

To run, type:

  1. ssh ipaddress
  2. cd reduction
  3. casapy

Using creates one FITS-IDI file for each sideband of each chunk in an SMA data set.

Once you have obtained the python scripts needed to import SMA data (see "Getting Started" section), you should be able to convert your SMA data into a CASA Measurement Set (MS). does not directly create a CASA MS. Instead, it creates one FITS-IDI file for each sideband of each chunk in an SMA data set. The reason for this is that the CASA MS format can be changed by the CASA software group at any time, but the FITS-IDI format is a standard format which is unlikely to be changed (because that would break may software packages). So FITS-IDI provides a fixed target for sma2casa, and CASA itself provides a routine (which is called by smaImportFix) for reading FITS-IDI files. uses the Tsys information in the SMA dataset to convert the SMA visibilities to "pseudo-Jansky" units, which should be fairly close (within ~20%) to the correct Jy values, unless there were significant problems with the track (for example, bad pointing). It also uses Tsys, along with integration time, to calculate weights for the visibilities. These calculations are performed before the FITS-IDI files are written.

The only required pararameter for the script is the path to the SMA-data set. accepts the following optional parameters:

Usage: path-to-SMA-data [options]
-c m,n,o [--chunks=m,n,o] Process only chunks m, n and o
-h [--help] Print this message and exit
-l [--lower] Process lower sideband only
-p [--percent] % to trim on band edge (default = 10)
-P [--PythonOnly] Do not use the C module "makevis"
-r [--receiver] Specify the receiver for multi-receiver tracks
-R [--RxFix] Force the data to be treated as single receiver
-s [--silent] Run silently unless an error occurs
-t [--trim] Set the amplitude at chunk edges to 0.0
-T -T n=m means use ant m's Tsys for ant n
-u [--upper] Process upper sideband only
NOTE: The options list above must come after the path argument.  

The -T option provides a crude way to handle bad Tsys information. If antenna n has noisy or garbage Tsys values, -T allows antenna m's Tsys to be used instead for that antenna. Dual receiver tracks must use the -r option, and each receiver must be processed separately.


ShellPrompt> /sma/SMAusers/taco/130408_17:20:01/

will process both sidebands of all chunks in the data set located at /sma/SMAusers/taco/130408_17:20:01/

NOTE: The command above will fail if that data set has data from more than one receiver. To reduce dual receiver tracks the -r option must be used to select a receiver..

ShellPrompt> /sma/SMAusers/taco/130408_17:20:01/ -u -r 400

will process the data from the upper sideband of the 400 GHz Rx only.

ShellPrompt> /sma/SMAusers/taco/130408_17:20:01/ -l -c 3,5,20

will make a FITS-IDI file for the lower sideband of chunks s03, s05 and s20, if data exists for those chunks (i.e. if the number of channels has not been set to 0 in the restartCorrelator command).

ShellPrompt> /sma/SMAusers/taco/130408_17:20:01/ -l -c 40 -t

will make a FITS-IDI file for the lower sideband of chunk s40, and trim the highest and lowest 10% of the channels by setting their amplitudes to 0.0 (which will ultimately cause them to be flagged bad).

ShellPrompt> /sma/SMAusers/taco/130408_17:20:01/ -T 4=7 -T 5=7

will make a set of FITS-IDI files with the Tsys values for antennas 4 and 5 replaced by the Tsys values for antenna 7. Caveats

The Caveats

File/Object   Notes The script calls a C language module called makevis. The object code for that module is, and it must be present in the directory you are running in. This module does the low-level processing of the visibilities data, and writing it in C rather than Python very significantly speeds up the execution of Since is compiled code, it is not as portable as Python code. You may need to recompile makevis.c if does not work properly on your machine. The source code and a Makefile are included in the git repository. It is almost certain that the Makefile will work only on Linux machines.
If are having trouble with makevis, and don't want to bother getting it working on your machine, you can specify -P on your command line, which will make the script run using Python code only, and be a factor of ~2 slower.
Host Computer processes the visibilities by mapping the entire visibilities data file into RAM. The script is apt to run very slowly if the computer's available RAM is smaller than the size of the file sch_read.
Missing tsys_read File If you run on a very old SMA data set, it may immediately abort because the data set does not contain a tsys_read file. This file is required, and it can be built from data stored in the eng_read file. This process can't be done automatically, because the procedure depends on the receivers which were active in the track, and the state of the Bandwidth Doubler Assembly. Contact if you need to have a tsys_read file made for your track.
Using reads the FITS-IDI files into CASA.

The second Python script,, will read the FITS-IDI files into CASA, and produce a single MS with all available chunks, including the pseudo-continuum "chunk". should be executed from within CASA, and it assumes that the FITS-IDI files have already been created by, and that those files are in the current/working directory (pwd). The script does the following things:

  1. Makes a list of which FITS-IDI files are in the current directory, so that it knows which chunks should be processed.
  2. Reads each FITS-IDI file into a separate CASA MS.
  3. indicates a data value is bad by setting its amplitude to 0.0 . runs the flagdata task on the newly created MS, in order to explicitly flag those data points bad within the MS. Chunk edge channels are also flagged bad in this step, if you passed arguments to indicating that you wanted to trim edge channels.
  4. The FITS-IDI files are deleted.
  5. Fixes the weights. For all chunks except the pseudo-continuum chunk, the CASA importfitsidi file sets the data weights to 1.0. fixes this problem, and puts the proper weights, proportional to (integration time)/Tsys**2, in the weight table of the MS.
  6. Generates new scan numbers. In the raw SMA data sets, each timeslice of data stored by the correlator has a unique scan number. CASA MSs usually have scan numbers which change only when the source is changed, which can be helpful in controlling how calibration information is averaged and interpolated So generates a new set of scan numbers which increment only when the source changes. This means that there will usually be several integrations which share the same scan number.
  7. The individual chunk MSs are concatonated into a single MS. There is one such concatonated MS made for each sideband, named MyDataLower and MyDataUpper. Caveats

The Caveats

File/Object   Notes  
Dual-Rx There is a strange, intermitent problem with importing Dual-Rx data into CASA. Occasionally the frequency scale gets set incorrectly. There is a work-around for this issue:
  1. make an empty directory.
  2. copy the FITS-IDI files into that new directory.
  3. cd to the new directory.
  4. (Re)start CASA in the new directory, and run
  5. Copy the MSs where ever you want them to live.
Remember, this is only required for dual-Rx tracks.
System Temperature Table The CASA SYSCAL table has the Tsys values stored in it, but that is probably only useful for plotting the Tsys data (via browsetable). The SYSCAL table is not in the format expected by CASA, so the CASA commands which use Tsys information will not work properly. Notes


File/Object   Notes  
Chunk Names in CASA If you import all the spectral chunks, but not the pseudo-continuum "chunk" (by default the pseudo-continuum is imported) then chunk s01 will be spw0, chunk s02 will be spw1 etc. The pseudo-continuum chunk is not very useful in CASA, but the default behavior has the nice property that SMA chunk numbers and spectral window numbers are the same.
Antenna Names and pads The names of the antennas in CASA will be "SMA1", "SMA2", etc. The "Station" parameter for each antenna is set to the pad name for the pad the antenna was sitting on during the observation.

The tabbed pages below give step-by-step instructions for the calibration and imaging of several SMA data sets.
Official NRAO SMA+CASA Tutorial (A mosaic observation)

Commands  Plots  
The NRAO site has an SMA Tutorials page explaining how to reduce SMA data. The instructions there were put together by actual CASA experts, and that's probably the best information available. However some of the information on the NRAO site will not be applicable to data sets imported to CASA with and, because the NRAO Tutorial assumes that the SMA data will be processed through Miriad (yet another data reduction package) before it is imported to CASA. The information in the turorial that deals with the CASA commands themselves, rather than importing the data to CASA, is fully applicable to data imported with the scripts. Note that for the NRAO tutorial, the pseudo-continuum channel produced by the SMA realtime system ('c1', in MIR terminology), was not imported to CASA. That means that the tutorial's spw0 corresponds to SMA chunk s01. If you use, spw0 will be the pseudo-continuum channel, and spw1 will be SMA chunk s01. So in the NRAO tutorial, there are many places where spwmap is set to [0] during calibration steps; in most cases that should be [1] if and were used. You can, of course, tell to ignore the pseudo-continuum channel by using the -c option and listing the spectral chunks.

One thing to keep in mind is that the NRAO script only processes chunks s01 though s24. Most SMA data sets also contain chunks s25 through s48 (double bandwidth mode). It is usually best to calibrate these two sets of chunks separately, because they enter the correlator through different IF pathways, and their phases can (and at some level always do) drift relative to each other. So no single complex gain calibration is appropriate for both.
Reduction of an IRC+10216 line survey track (single receiver, 4 GHz mode)

Commands  Plots  
This section is under construction
During 2007 and 2009, the SMA did a line survey of the well known carbon star IRC+10216 (CW Leo), in the subcompact configuration. The results of this survey were published in Patel et al., 2011, ApJ Supp volume 193. The discussion below shows the reduction of one track from that survey, which was observed on Feb. 2, 2009. The data set is 090202_07:19:01.

The image to the right shows the corrPlotter display for this track, which is a useful graphical summary of the track. As with all these images, you can click on the image to the right to get the full plot, which is legible, if expanded to full-screen.

A script is available that reduces the lower sideband data via the steps described below. The data was collected in "Double Bandwidth" mode (or single receiver, 4 GHz mode), which is the most common SMA observing mode by far. Because the 4 GHz IF is sent to the ASIC correlator via two independent IF pathways, the upper 2 GHz (chunks s25-s48) can drift in phase relative to the lower 2 GHz (chunks s01-s24). Because of that potential phase drift, many of the calibration steps are performed on the two IF halves separately. The script takes about 6.5 hours to run on the computer cfa0.

In preparation for running the script below, was run with the -l switch, to select the lower sideband only, and the -t switch, to flag the chunk edges bad.
The first few lines of the script are shown and described below:

def waitForCASA():             # This fuction is called after each CASA task,
    os.system('sleep 2')       # to ensure files are closed etc before the
    clearstat()                # next task begins

bpSource = '3c273'             # Define the bandpass calibration source
gcSource = '0851+202'          # Define the complex gain calibrator
scienceSource='cwleosma'       # Define the science target source
refAnt = 'SMA6'                # Define the reference antenna

os.system('date')              # print the system time - not needed
execfile('')    # Read in the FITS-IDI files from
os.system('rm -r -f Upper*')   # Get rid of some scratch files
os.system('rm -r -f Lower*')
vis ='MyDataLower'
ms.split(outputms=vis+'.tile',tileshape=[1,256,54]) # "tile" the MS
vis ='MyDataLower.tile'

The last few steps above create a "tiled" measurement set. Some of the CASA tasks will execute more quickly if presented with a tiled MS. The parameters 1,256,54 were derived by NRAO gurus, with a deep understanding of CASA. Those same values are purported to be appropriate for all SMA data sets.

# Process the two IF halves separately
for first in (1, 25): # Loop over 2 values. "first" is first spw
    last = first+23   # last will be the last spw in the set
    print 'Processing chunks %02d through %02d' % (first, last)
    ext = '.s%02d-s%02d' % (first, last) # Put this string in file names
    spwList = '%d~%d' % (first, last) # The spws to process as a set
    spwMap = [first]

Many of the calibration tasks will produce a single calibration table from the data in all the chunks (spws) which are processed as a group. That single calibration table must be applied to every one of the selected spws, and the spwMap is used to produce that behavior.

We want to vector average all the data taken on the bandpass calibrator, but there was some phase drifting, and a couple of phase jumps, during the time we stared at the bandpass calibrator. So we need to line up all the phases before summing the data, by essentially self-calibrating the bandpass calibrator data. This is done by the task below, which does a phase-only gain calibration of the 3C273. The solint='int' parameter tells the task to find a solution for each integration (typically 30 seconds, for the SMA). The combine='spw' parameter tells gaincal to average all the spectral windows, to get the best signal/noise.


You could plot the results of the gaincal with the command


which would produce a plot like the one shown on the right. You can see that the selfcal has reproduced the phase changes seen in the earlier corrPlotter display.
Next the script produces the bandpass calibration table with the command


The bandtype='B' tells the routine to do a channel by channl (rather than polynomial) fit. This is probably preferable to a polynomial fit unless the signal/noise is poor. There's really no reason to expect a chunk's bandpass to be well modeled by a low order polynomial. Although 'B' specifies channel fit, there will not be a solution for each individual channel because of the solint parameter. The solint=['inf','3.25MHz'] parameter tells the routine to sum all the integrations together ('inf'), and produce a calibration point every 3.25 MHz (32 points over the full 104 MHz width of each chunk). Strictly speaking, the combine='scan' is not needed, because the bandpass calibrator was only observed once, so there is only one scan using the CASA definition of scan (there were many individual integrations, however). The gaintable=vis+ext+'.bpPhaseGC' and spwmap=spwMap parameters tell bandpass to apply our earlier phase selfcal table before deriving the bandpass table.
CO in the Gomez Hamburger (single receiver, 2 GHz mode)

Commands  Plots  
The Gomez Hamburger (IRAS 18059-3211) is a pre main-sequence star. The SMA observed the CO(2-1) line in this object on June 9, 2006. The dataset is 060609_07:34:16. The track was taken in the extended configuration, and all 8 antennas were functional. Most of the chunks were set to a coarse resolution of 32 channels/chunk, but several were set to 512 channels/chunk to target spectral lines. Only one spectral line was detected, CO(2-1) in chunk s12 USB. The results of this observation were published in Bujarrabal et al., 2008, A&A 482, pp 839-845.
The image to the right shows the corrPlotter display for this track, which is a useful graphical summary of the track. You can click on the image to the right to get the full plot, which is legible, if expanded to full-screen.
For this example, only the upper sideband, will be processed. was run with the -u option to select the uppersideband, and -t to flag the chunk edge channels as bad. Processing within CASA was done by executing a python script,, using the execfile function. The steps taken in the script will be described below. In the text below, lines copied from the script will appear in this font.

def waitForCASA():
os.system('sleep 2')

The function above is called in between certain CASA tasks. It prevents CASA from starting a task before the preceeding task has released all of its locks on files.

ones = [1,1,1,1,1,1,1,1,1,1,

The ones array (technically a Python list) is used later in the script to tell CASA to apply a single calibration to multiple SMA chunks (spws). The observation being reduced here only has 24 spws, and there are 49 elements in the ones array, but the extra elements don't matter.


The commands above import the data into CASA, and create the MyDataUpper MS. If you stopped the script at this point, you could take a look at the CO(2-1) line by executing the command


Which would produce the plot shown on the right.
Continuing on with the script:

vis ='MyDataUpper'

This creates a new "tiled" MS, which can be processed more quickly by CASA than the original MyDataUpper can be. Determining the magic values of 1,256,54 requires deep understanding of CASA, but those values are purported to work for any SMA data set.

vis ='MyDataUpper.tile' setjy(vis=vis,field='3c279',standard='manual',fluxdensity=[13.2,0,0,0],

The data set contains observations of Jupiter, Ganymede and Uranus. Normally we would want to do a primary flux calibration using Ganymede, and maybe Uranus, but this data set is so old that the CASA ephemerides do not cover the observation date. So the CASA task one would normally use to bootstrap the calibration, setjy, won't work. So the commands above simply slam in the flux values for the bandpass and gain calibrators, using data from the online SMA calibrator list. Quasars 3c279, 1911-201 and 1924-292 are given fluxes of 13.2, 2.3 and 4.4 Jy respectively.


This does an initial phase-only calibration on 3c279. The purpose of this is to line up the phases on the bandpass calibrator, to improve the SNR of the bandpass calibration. It is effectively a self-calibration of the bandpass calibrator. The solint='int' tells gaincal() to derive a new solution every integration (every scan, in SMA terminoloty). The spw='1~24' tells gaincal() to use the data from all chunks.
If you were to stop the script at this point, you could plot the gain calibration results with the following commad:


(there's no reason to plot SMA8 since that was the referrence antenna). The above command will produce the plot shown on the right.
Continuing with the script:


This produces a bandpass calibration table. The original phase-only selfcal only produced a calibration for spw1, even though all spws were used. The spwmap=[ones] option uses the "ones" array defined at the top of the script to tell bandpass() to apply the gain solution for spw1 to all the spws. The solint=['inf','3.25MHz'] tells bandpass() to solve over an infinite time interval (combine all data into one calibration) and to produce solutions every 3.25 MHz in frequency. 3.25 MHz is the width of the course resolution chunks, so here we are effectively smoothing the bandpass solution for the 512 channel chunks. This is done because the available bandpass data is not sufficient to give a high SNR solution for every channel in the high resolution chunks.
If you interrupted the script at this point, you could plot the bandpass solutions with the command


Which would produce a series of plots like the one shown on the right.
Now we'll apply the bandpass calibration to the data, with applycal():


If we interrupt the script here, we can check that the bandpass calibration did something reasonable. We can use plotms() to show the amplitude of Uranus on the 1-* baseline, in the raw data, as shown on the right.
If we then select the "corrected" data column in plotms(), we see that the bandpass calibration has done a nice job of flattening the amplitudes, as shown on the right.
Next, let's do the gain calibration with our two quasar calibrators:

gaintype='G',calmode='ap',caltable=vis+'.gaincal',field='1911-201,1924-292') waitForCASA()

The command above does both an amplitude and phase gain calibration. If we stopped the script at this point, we could plot the gain calibration solution with the comand


which will produce the plot shown on the right.
The gain calibration plots show that the phase calibration is rather noisy, particularly near the beginning of the track. The script smooths the calibration table with the following command


The command below produces a plot of the smoothed gain calibration, as shown to the right.


You can see that while the smoothed calibration table is indeed smoother, the solution still jumps around significantly early in the track. That's because two gain calibrators weere used to derive the calibration table, and smoothcal can only smooth the solutions from the two sources separately.
The next command applies the bandpass and gain calibration tables to the data, producing a "corrected" data column which can be imaged:

applycal(vis=vis,gaintable=[vis+'.bandpass.bcal', vis+'.gaincal.smoothed'],

Since our data is calibrated, we can now produce a smaller MS which contains only the calibrated science source visibilities:

split(vis=vis,outputvis='gomezUSB',datacolumn= 'corrected',field='gomez')

It's clear from the corrPlotter display that continuum emission from the science source was detected. We can image the entire upper sideband emission with the command


the command


produces the plot shown to the right. Clearly the source is offset a bit from the phase center.
Using plotms, we can see that the CO(2-1) line falls in the 22 channels of spw12 starting with channel 260. We can image the full CO line with the command


The command


produces the plot shown on the right. Clearly the CO emission is resolved.
The red-shifted CO can be imaged with the command


producing the plot on the right via "viewer"
Finally, the blue-shifted CO can be imaged with the command


producing the plot on the right via "viewer"

eMail support

For help processing SMA data in CASA please contact .

Contributions from Users

This website is intended to provide information to users in order to process SMA data using CASA.

General information regarding CASA is available at the National Radio Astronomy Observatory (NRAO) website

The SMA CASA staff/scientist/users welcome your comments and encourage you to join our mailing list

Topics include:
Notification of SMA CASA script updates    
Submitting error reports    
Tips, Shorts-Cuts, etc.    
Data testing and anaylsis