Installing am ============= 2007 November 20 Contents: 1. Windows 2. GNU/Linux, Unix, Mac 3. Environment Variables 4. Optimizing Performance 1. Windows ========== A precompiled version of am is available as a Windows installer file. The executable has been compiled with OpenMP support using Microsoft Visual C++ version 8.0. The installer will install the required OpenMP library if it is not already on the system. The installer will create a directory for the am cache files, and create a new user environment variable AM_CACHE_PATH pointing to the cache directory. On a typical system, this directory will be c:\Documents and Settings\\Application Data\am Support files, including source code and example configuration files, will be installed in the Program Files directory, typically c:\Program Files\am The executable file am.exe will be copied to the Windows directory, typically c:\Windows so that it will be found in the system path when am is invoked from the command line. Uninstalling will undo the above, with two exceptions. First, if the cache directory is not empty, it will not be removed. Second, if multiple users install the program on the same system, and it is subsequently uninstalled, the cache directory (if empty) and environment variable AM_CACHE_PATH will only be removed for the user doing the uninstall-- other users need to remove these manually, if desired. To install: a. Obtain the installer file, am-5.2-x86.msi. A zipped copy can be found at: http://www.cfa.harvard.edu/~spaine/am b. Run the installer file, either by double-clicking it, or giving the command: msiexec /i To remove: a. Go to Start->Settings->Control Panel->Add or Remove Programs. b. Select "Change or Remove Programs." Highlight "am-5.2" in the list of currently installed programs and click "Remove." 2. GNU/Linux, Unix, MacOS ========================= On these systems, am is installed by compiling from source and copying the resulting executable to the desired directory. The cache directory and user environment are also set up manually. The compiler assumed by the default Makefile target, and by the generic OpenMP target "omp", is gcc. Many Linux and Unix systems will have gcc already installed; on the Mac, one typically has to install the "developer tools" from the CD that comes with the machine. Other compilers can be used-- the Makefile has targets for various systems, using various compilers including Sun Studio, Intel, and sgi MIPSpro. These may be usable on your system as-is, or easily modified to suit. The following installation steps are for a typical GNU/Linux system; the procedure on Unix and MacOS will be similar: a. Check the version of gcc installed on the system with the command: gcc --version If the version is 4.2 or higher (4.1.1 or higher on Red Hat systems), then OpenMP is supported. b. Obtain a copy of the archive file am-5.2.tgz. A copy can be found at http://www.cfa.harvard.edu/~spaine/am c. In a suitable directory, unpack the archive with the command: tar xzf am-5.2.tgz This creates a directory am-5.2 containing the source files. d. Change to the am-5.2 directory, and run make omp to build the OpenMP version, or just make to build a single-threaded version. e. Copy the resulting executable file "am" to a suitable directory in the user's path. This could be, for example, a private bin directory ~/bin for a single user, or /usr/local/bin for all users. Giving the command make install as root, or sudo make install will copy am to /usr/local/bin, and set the appropriate file attributes. f. Create a directory for the am cache files, and set an environment variable so that am can find it. The cache directory can be private, or shared by multiple users--concurrent instances of am can safely share the cache. For good performance, it is important that the cache directory reside on a locally-connected disk, rather than on a network mount. A typical user-private setup would be to create a directory ".am" in the user's home directory, and add the line export AM_CACHE_PATH=/home//.am to the user's .bashrc (for bash users) or .profile (for Bourne shell users). C shell or tcsh users would add the line setenv AM_CACHE_PATH /home//.am to their .cshrc file. 3. Environment Variables ======================== The following environment variables affect the operation of am: AM_CACHE_PATH As discussed above, this is the path to the am cache directory. If this variable is not defined, or set to an empty string, the disk cache is disabled. AM_CACHE_HASH_MODULUS This can be used to modify the number of hash buckets in the cache. By default, there are 509 hash buckets containing up to 4 files per bucket, meaning that the maximum number of files in the cache directory is limited to 2036. Normally, this value will not need to be changed. If it is, prime numbers are recommended. When this parameter is changed, any existing cache files will become unusable, and will eventually be evicted from the cache. AM_FIT_INPUT_PATH AM_FIT_OUTPUT_PATH By default, am reads fit data from the current directory, and writes output files to the current directory. These variables may be used to set alternative input and output directories. OMP_NUM_THREADS This may be used to set the number of threads which will be used for parallel regions of the program. Depending on the OpenMP implementation, the default value may be the number of processors or processor cores, or some other value. Invoking am with no arguments will provide information on the number of available processors and the current setting of OMP_NUM_THREADS. On workstations, OMP_NUM_THREADS should normally be set equal to the number of processors; on a large multi-user compute server, a smaller number might be appropriate. OMP_NESTED Controls whether OpenMP nested parallelism is enabled. Setting OMP_NESTED=true enables nesting, and OMP_NESTED=false disables it. In the current version of am, this only affects excess delay and instrumental line shape convolution computations. The performance gain from setting OMP_NESTED=true is marginal. Setting OMP_NESTED=false is recommended, and is the default for most OpenMP implementations. 4. Optimizing Performance ========================= The most important factors for obtaining good performance are to set up the disk cache as described above, and to compile with OpenMP on multicore or multi-cpu machines. There are a few compile-time tuning options which override built-in defaults, but the defaults will normally give close to optimum performance. The options are: L1_CACHE_BYTES - size, in bytes, of the L1 data cache per core. The default is 0x8000 (32 kB). This controls cache-blocking of absorption coefficient computations, and sets the point where FFT and FHT computations switch over from recursive to iterative. Setting this parameter larger than the actual L1 cache size will hurt performance; setting it somewhat smaller won't matter much. L2_CACHE_BYTES - size, in bytes, of the L2 data cache per core. This has no effect on performance in the current version of am. It is only used to set the size of benchmarks run with am -b. LINESUM_MIN_THD_BLOCKSIZE - is the smallest number of frequency grid points which will be given to a thread doing a block of a line-by-line computation. The default value is 8, which corresponds to the number of 8-byte double-precision numbers which will fit into the 64-byte cache lines found on many machines. To avoid false sharing, it is not recommended to make this any smaller. (False sharing occurs when two or more threads, running on different CPUs, access unshared data that reside in the same cache line. Each time one thread writes data in this cache line, it invalidates the copies of the same cache line, including the unshared data, held by all the other CPUs.) FFT_UNIT_STRIDE FHT_UNIT_STRIDE - selects whether iterative FFT's and FHT's are done with unit-stride memory access (at the expense of extra trig computations), or with non-unit-stride memory access. These computations are done entirely in L1 cache, minimizing the cost of non-unit-stride access, so the default setting for both these parameters is 0. Some machines (e.g. Sun Ultra 3) will be faster if these parameters are set to 1; see the Makefile for examples. The following timings for the am regression test on an AMD Opteron 180 dual-core CPU are examples of the results for different compiler options and compilers. The OS was Fedora Core 5 GNU/Linux (2.6.20-1.2320.fc5 x86_64). GNU gcc gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51) ------------------------------------------- -O3 -fast-math (default Makefile target) 511 s -fopenmp -O3 -fast-math (omp Makefile target) 287 s -O3 -fast-math -march=opteron -DL1_CACHE_BYTES=0x10000 473 s -fopenmp -O3 -fast-math -march=opteron -DL1_CACHE_BYTES=0x10000 275 s Intel icc icc (ICC) 10.0 20070613 ----------------------- -O3 -DL1_CACHE_BYTES=0x10000 482 s -openmp -O3 -DL1_CACHE_BYTES=0x10000 286 s