Testing Parallelized CASA on the Hydra Cluster
Progress Report
The Tests
We calibrate the M100 Band 3 Science Verification dataset, using
an edited version of the python script alma-m100-analysis-hpc-regression.py, which is included in the (parallelized) CASA tarballs.
For purposes of comparison, we do initial tests on rtdc7.
General Notes:
- CASA has two parallelization modes depending on the version:
- Old Parallel (OP) Framework>, which makes use of an external cluster configuration file (deprecated in CASA 4.5.0).
- Message Passing Interface (MPI) Framework (implemented in CASA 4.3.0).
- The following python scripts were used:
        m100-casa44-tarball-script.py (CASA 4.4.0).
        m100-pre-casa44-tarball-script.py (pre-CASA 4.4.0).
- The scripts were executed with the commands:
        m100-casa44-tarball-script.sh. (CASA 4.4.0, uses MPI framework).
        m100-pre-casa44-tarball-script.sh. (pre-CASA 4.4.0, uses OP framework).
- Steps 0-18 constitute the calibration.
- All interactive lines have been removed to enable comparison with hydra batch jobs.
- References to ALMA Science Data Model (ASDM) format files have also been removed, since they are not readily available. Before running the script, we rename:
        mv X54.ms X54-monolith.ms |
        mv X220.ms X220-monolith.ms |
- We remove the directories (CASA 4.4.0 and later only):
        \rm -r X54.ms.flagversions |
        \rm -r X220.ms.flagversions |
New Results: July 2017
CASA 4.7.2 (and previous versions) appear to have no real parallelism capabilities. Changing the number of engines (CPUs) makes negligible difference to the execution time. The dominant factor is the I/O speed of the disk. Running CASA on an NFS mounted disk will take approximately 5 times longer than a local disk.
CASA 4.7.2 (release date: 3/23/2017)
MACHINE |
Details |
# Engines |
Disk type |
real |
user |
system |
RTDC |
rtdc7 |
3 |
local |
40m 48.6s |
26m 21s |
4m 59s |
RTDC |
rtdc7 |
7 |
local |
43m 11s |
30m 28s |
5m 38s |
RTDC |
rtdc7 |
13 |
local |
41m 40s |
30m 35s |
8m 40s |
RTDC |
rtdc9 |
13 |
nfs |
49m 7s |
15m 2s |
8m 27s |
RTDC |
rtdc9 |
3 |
local |
11m 20s |
12m 57s |
1m 55s |
RTDC |
rtdc9 |
13 |
local |
8m 51s |
14m 6s |
4m 20s |
RTDC |
rtdc9 |
15 |
local |
9m 26s |
15m 27s |
8m 20s |
RTDC |
rtdc8 |
3 |
local |
14m 56s |
15m 28s |
4m 1s |
RTDC |
rtdc13 |
13 |
local |
20m 7s |
30m 48s |
14m 42s |
. |
. |
. |
. |
. |
. |
. |
hydra |
head node |
7 |
nfs |
52m 1.2s |
32m 18.9s |
8m 56.8s |
hydra |
node 2-9 |
7 |
nfs |
42m 54s |
15m 38s |
4m 11s |
hydra |
node 2-9 |
7 |
local |
11m 16s |
13m 34s |
2m 58s |
hydra |
node 2-9 |
13 |
nfs |
41m 51s |
19m 2s |
4m 39s |
hydra |
node 2-9 |
13 |
local |
11m 1s |
13m 55s |
3m 27s |
hydra |
node 2-9 |
3 |
local |
14m 22s |
13m 23s |
2m 25s |
hydra |
node 0-4 |
7 |
nfs |
52m 4s |
25m 35s |
7m 39s |
hydra |
node 0-4 |
7 |
ssd |
20m 59s |
31m 40s |
12m 12s |
Notes:
- Tests on 0-4 were run while the machine was as already busy w/ an other job.
- ssd means it was running off a local SSD disk.
- rtdc7 n=7 local & rtdc8 n=3 were done while other jobs were running.
- By far the most important factor is the I/O speed of the disk (e.g. local v mounted)
- The difference between rtdc7 and rtdc9 is RAM (rtdc7=48GB/rtdc9=132GB). Both have 3.5GHz CPUs.
Historical Results: Sept. 2015
All tests run with seven engines unless otherwise stated.
CASA 4.4.0 (release date: 6/22/2015)
MACHINE |
Details |
CPU (GHz) |
RAM (GB) |
user |
system |
real |
rtdc7 |
RTDC |
16x3.5 |
48.0 |
37m 30.223s |
5m 56.912s |
31m 4.924s |
. |
. |
. |
. |
. |
. |
. |
hydra |
head node |
24 |
--- |
44m 13.792s |
11m 49.211s |
1h 2m 56.820s |
" |
node 2-9 |
64 |
--- |
43m 33.455s |
7m 43.456s |
58m 1.409s |
" |
node 2-9 (local disk) |
" |
--- |
37m 50.057s |
8m 10.176s |
26m 6.772s |
" |
node 0-9 (with SSD disk) |
72 |
--- |
18m 58.938s |
5m 4.302s |
12m 43.140s |
Notes:
- We run these tests using the MPI framework (see general notes above).
- The hydra tests were run post-upgrade (September, 2015).
CASA 4.3.0 (release date: 1/12/2015)
MACHINE |
Details |
CPU (GHz) |
RAM (GB) |
user |
system |
real |
rtdc7 |
RTDC |
16x3.5 |
48.0 |
20m 30.728s |
11m 9.580s |
1h 40m 54.629s |
. |
. |
. |
. |
. |
. |
. |
hydra |
/pool/cluster7 (NetApp disk) |
64 |
256 |
1h 10m 23.062s |
32m 27.517s |
3h 23m 5.59s |
" |
/state/partition1 (local disk) |
" |
" |
1h 14m 15.682s |
55m 58.055s |
6h 17m 18.93s |
" |
/pool/cluster7 (NetApp disk) |
" |
" |
1h 10m 3.464s |
32m 0.967s |
3h 16m 48.75s |
" |
" |
" |
" |
1h 13m 18.721s |
53m 1.329s |
3h 25m 30.87s |
Notes:
- The CASA 4.3.0 release notes claim:
    "CASA has a new MPI parallelization framework and is currently testing use cases."
We use the OP Framework here (see general notes above), since the MPI Framework is still in the early stages of testing.
- The first three hydra tests employed 7 engines, the last one 14 engines.
- The first two hydra tests ran without any memory specification, the last two ran with mem=8192.
- The hydra tests were run pre-upgrade (May, 2015).
CASA 4.2.0 (release date: 2/11/2014)
MACHINE |
Details |
CPU (GHz) |
RAM (GB) |
user |
system |
real |
rtdc7 |
test failed (see report) |
--- |
--- |
--- |
--- |
--- |
. |
. |
. |
. |
. |
. |
. |
hydra |
did not test |
--- |
--- |
--- |
--- |
--- |
Notes:
- The test fails due to a bug in the software. For details, consult:
        The CASA 4.2.0 report
- Tests were not attempted on hydra.
CASA 4.1.0 (release date: 7/2/2013)
MACHINE |
Details |
CPU (GHz) |
RAM (GB) |
user |
system |
real |
rtdc7 |
RTDC |
16x3.5 |
48.0 |
18m 30.924s |
9m 22.946s |
1h 22m 41.580s |
rglinux12 |
" |
  8x2.9 |
" |
23m 48.106s |
9m 35.289s |
1h 11m 56.657s |
. |
. |
. |
. |
. |
. |
. |
hydra |
compute-0-32 (NetApp disk) |
64x2.2 |
256 |
1h 29m 18.459s |
1h 18m 32.504s |
4h 46m 53.30s |
" |
compute-0-32 (local disk) |
" |
" |
1h 30m 2.212s |
1h 37m 44.280s |
3h 43m 13.11s |
Notes:
- The RTDC machine, rglinux12, was run for comparison.
- The hydra tests were run on a single dedicated machine (compute-0-32.local) as sole user. Any other means of running
resulted in far poorer results.
- For details about the test and results, consult:
        The CASA 4.1.0 report .
|