Date: Tue, 24 Jul 2012 16:50:12 -0400 (EDT)
From: Ken Young 
To: eketo@cfa.harvard.edu, ckatz@cfa.harvard.edu, cqi@cfa.harvard.edu,
     mgurwell@cfa.harvard.edu, npatel@cfa.harvard.edu,
     jweintroub@cfa.harvard.edu, dwilner@cfa.harvard.edu,
     aargon@cfa.harvard.edu, jzhao@cfa.harvard.edu, rrao@cfa.harvard.edu,
     qzhang@cfa.harvard.edu, tksridha@cfa.harvard.edu, gpetitpa@sma.hawaii.edu,
     rblundell@cfa.harvard.edu, tksridha@cfa.harvard.edu
Subject: Meeting on SMA data file format

Dear SMA Data Involvee,

   Near the end of this year, we should be able to use the new Interim
Correlator.   The size of our datasets will increase by a factor of 10, on
average, so we will need to make some modifications to how our data is
processed.   As a first step towards supporting these new larger data sets,
we are now recording our data on an x86 linux virtual machine, rather than
one of the old PowerPCs, and that means we are byte-swapping all the data
both when it is written and when it is read, chewing up lots of wasted CPU
cycles.   It seems to make sense for us to make several changes to our
data file format when the Interim Correlator comes on line.   Among the
possible changes would be:

1) Stop needlessly byte-swapping
2) Store antenna-based Tsys values rather than baseline-based ones (we
actually store *both* now).
3) Drop all of the unused variables from our data structures.   There are
lots of them.   We may want to rename some of the elements that have
been repurposed for polarization.   Also, some inconsistancies exist in
the names - for example weights are sometimes .wt and other times .wts.
4) The format we use encodes the data type into each scan's visibilities
(typically 2688 copies per scan), but the format never changes, so
storing format information is pointless.   A back-of-the-envelope
calculation shows that we have stored about 50 GBytes of completely
unneeded format data.
5) We have detailed flagging information stored, but never used.   That
information could be used to do helpful things like unflagging data that
was flagged bad during data reduction, without losing the online-system's
flagging information.   The system we have now which uses the same
variable for Tsys and flagging is poor, I think.
6) We write an engineering data file for each track, which I'm not certain
we use at all.   It definitely contains redundent information.   We should
clean it up if we're using it, or drop it if we're not.
7) How are we going to tell MIR and Miriad to process the new Interim
Correlator data, which will be stored as two enormous chunks with high
spectral resolution even for continuum tracks?
8) Other changes you'd like to see made.

I would like to invite you to a meeting to discuss this next Monday, the
30th of July, at 3:00 PM (so the Hawaii folks can call in) in the SMA
Control Room.    Please let me know if you would like to attend, but
cannot do so at that time.

I hope to see you there!

Taco