------------------------------------------------------------------------------------------------------------------ Running Parallel CASA, Second Progress Report Alice Argon, 9/2/2014 ------------------------------------------------------------------------------------------------------------------ CASA versions: -------------- 1/30/2014: CASA 4.2.0 release (casapy-42.0.28322-021-1-64b) 6/07/2013: CASA 4.1.0 release (casapy-41.0.24668-001-64b-2, libraries included) Patches to the latest release: ------------------------------ 5/12/2014: CASA 4.2.1 release (r29047) "is a patch to CASA 4.2.0 to fix a bug in the viewer image handling when it is started from the command line." 9/05/2014: CASA 4.2.2 release (r30986) "is a patch release of CASA 4.2.0/4.2.1 to introduce SIGMA and WEIGHT columns defined according to channel width and integration time. The patch also adds the plotms capability to export iterated plots in multiple files." Release Notes (CASA 4.0.1) claim: --------------------------------- "experimental parallelization of most visibilty-related tasks (but excluding those that produce new output MSs)." There have been no major updates. Note: MS = measurement set, i.e., data set Preparation (same as for first report): --------------------------------------- 1) Remove all references to plots, X-windows, etc., from python script. Hydra batch jobs will not accept them. 2) Use "-nogui" option when running CASA. Required even if plot routines not explicitly called. 3) Force number of "engines" to be 7 (a number all test machines can accomodate) by means of configuration file: http://www.cfa.harvard.edu/rtdc/ALMA/HYDRA/ If do not set, CASA will make its own selection based on machine resources, but not necessarily the same number from run to run. For reasons unclear to me, it is almost always an odd number (7,9,15,21,..). Note: CASA splits the MS into sub-MSs. Each engine takes one of the sub-MSs and performs the calibration at hand, allegedly in parallel. Test Overview: -------------- 1) Calibrate the M100 Band 3 Science Verification dataset: https://almascience.nrao.edu/alma-data/science-verification using steps 0-18 of "alma-m100-analysis-hpc-regression.py", included in the CASA 4.2.0 tarball. 2) Use RTDC's best performing desktop machine, rtdc7 (16 CPUs, 48 GB RAM). Test Results: ------------- 1) The CASA 4.2.0 calibration fails in step 8 (SPLIT). Note: The same script (i.e., the one included in the CASA 4.2.0 tarball) completes successfully in CASA 4.1.0. The two runs were as identical as possible, except for the CASA version employed (i.e., same machine, same day, same number of engines and sub-MSs requested). 2) The following CASA experts were consulted. Jeff Kern (NRAO) Mike Rodriguez (NRAO) Sandra Castro (ESO) 3) Sandra Castro says (3/24/2014): "I could reproduce your errors...I don't know why split fails. I have found a JIRA ticket describing the same problem, but no solution was offered." 4) Since the test could not be completed on a single machine, it was not attempted on the Hydra cluster.