create new tag
, view all tags

How to submit a MPI job, using PGI, over IB

  • This primer describes how to submit parallel MPI jobs, using program compiled and linked with the PGI compilers, over IB (MVAPICH)
  • The executable(s) MUST be compiled and linked using PGI and MVAPICH, (not MPICH)
  • The primer on compilers describe how to compile and link using PGI and MVAPICH.
  • An executable compiled w/ an other compiler (GNU, OpenMPI, Intel, etc) should not be submitted this way. It may run, but is likely to give you grief.

  • There is a different primer that explains how to submit PGI/MPI (MPICH) jobs that do not need to access the IB fabric.

To Submit a Job

  • The basics on how to submit a job are described in the primer's introduction on job submission, so read that one first.
  • The job file include the command to launch your OpenMPI program, using the mpirun command.
  • The number of processors and the machinefile (list of hosts to use) is not explicitly specified (hardwired) with the mpirun command.
  • You must invoke the corresponding mpirun (PGI/MVAPICH implementation), see example below.
  • The qsub file (or command)
    • will request a number of processors (CPUs, cores)
    • and specify the corresponding PE (parallel environment), via the qsub command
  • The job scheduler will grant the request and determine the hosts list for that specific job (i.e., the machinefile)
  • The jobs file can specify the PE and the number of processors via an embedded directive (i.e., #$ -pe mpich_ib 8)

Required Configuration

You must edit your ~/.cshrc and/or ~/.bashrc to define the two variables PGI and LD_LIBRARY_PATH
This is usually (but not always) set up when setting up your environment for the PGI compilers.
To check, type echo $PGI, it should return:
and type echo $LD_LIBRARY_PATH, it may return a long string, make sure that one of its component is:
  • Failure to define this variable will cause your jobs to fail, and will produce errors and aggravation.
  • If the variable PGI is not set
    • either
      [t]csh users [ba]sh users
      edit your ~/.cshrc: edit your ~/.bashrc:
      setenv PGI /share/apps/pgi export PGI=/share/apps/pgi
    • or do one of the following:
      specify -v PGI=/share/apps/pgi with qsub
      add #$ -v PGI=/share/apps/pgi in the job file
      add -v PGI=/share/apps/pgi in your ~/.sge_request file
read the man pages on qsub and/or sge_request to learn more.
  • If variable LD_LIBRARY_PATH
    • is not set:
      [t]csh users [ba]sh users
      edit your ~/.cshrc: edit your ~/.bashrc:
      setenv LD_LIBRARY_PATH $PGI/linux86-64/13.5p/libso export LD_LIBRARY_PATH=$PGI/linux86-64/13.5p/libso
    • does not contain the required component :
      [t]csh users [ba]sh users
      edit your ~/.cshrc: edit your ~/.bashrc:
      setenv LD_LIBRARY_PATH $PGI/linux86-64/13.5p/libso:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=$PGI/linux86-64/13.5p/libso:$LD_LIBRARY_PATH

  • It is convenient to define a second env var that points to the mvapich location, specifying explicitly the version to use:
    [t]csh users [ba]sh users
    setenv MVAPICH $PGI/linux86-64/13.5p/mpi/mvapich export MVAPICH=$PGI/linux86-64/13.5p/mpi/mvapich
    The alternative is to do so in the job files.


A Minimal Job File

  • Let's assume that you want to run the PGI/MPI executable executable mycode
    [t]csh syntax [ba]sh syntax
    hydra% cat mycode-csh.job hydra% cat mycode-sh.job
      . ~/.bashrc
      export PGI
    set MVAPICH = $PGI/linux86-64/13.5p/mpi/mvapich MVAPICH=$PGI/linux86-64/13.5p/mpi/mvapich
    $MVAPICH/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines ./mycode $MVAPICH/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines ./mycode
  • Note that if you define the env var MVAPICH via your ~/.cshrc or ~/.bashrc, you don't need to do it in the job file.
  • Note that the env vars NSLOTS and TMPDIR are not defined in the job file,
  • These variables will be set by the Grid Engine at execution time and will hold the number of granted slots and the list of hosts (machines) to use.
  • Note the syntax ./mycode to avoid an error message like this:
    [proxy:0:0@compute-N-M.local] HYDU_create_process (./utils/launch/launch.c:75): execvp error on file mycode (No such file or directory)

A Minimal qsub File

  • The corresponding qsub file is
    [t]csh syntax [ba]sh syntax
    hydra% cat mycode-csh.qsub hydra% cat mycode-sh.qsub
    qsub -pe mpich_ib 8 \ qsub -pe mpich_ib 8 \
    -cwd -j y \ -cwd -j y \
    -N mycode \ -N mycode \
    -o mycode.log \ -o mycode.log \
    mycode-csh.job -S /bin/sh \


  • The above example requests 8 processors (CPUs, cores); adjust that number to your needs.
  • The flag -pe mpich_ib 8 is the flag that tells SGE to use the parallel environment (PE) mpich_ib and requests 8 processors.
    • mpich_ib is the PE to use with and only with executables linked with the mvapich library.
    • The -pe mpich_ib 8 flag can be embedded (using #$) in the job file, like any other one,
    • options passed to qsub override embedded directives in the job file (including -pe)

Details for Experienced Users

  • At run-time, the scheduler defines the following MPICH specific variables:
    NSLOTS The granted number of slots, or number of processors for this MPI run
    TMPDIR The temporary directory, where the machinefile will be written,
      the file that holds the distribution of processors over the compute nodes

  • Hence you can use, in the jobs file, the commands
    echo number of slots is $NSLOTS to print out the granted value of NSLOTS
    echo machine file is $TMPDIR/machines  
    cat $TMPDIR/machines to print out the name and content of the machinefile

More Examples

  • Look, on hydra, in ~hpc/tests/mpi/pgi for some examples.
  • To try them, create a test directory and extract the compressed tar ball:
hydra% mkdir -p ~/tests/mpi/pgi/ib
hydra% cd ~/tests/mpi/pgi/ib
hydra% tar xvzf ~hpc/tests/mpi/pgi/ib/tests.tgz

  • Build the executable (assuming that your environment is set up correctly for the PGI compilers.)
hydra% make

  • Two job files allow you to test if you properly define both env vars PGI and MVAPICH:
    [t]csh users [ba]sh users
    hydra% qsub test-csh-env.job hydra% qsub test-sh-env.job
    look at the corresponding .log files and compare them to the same files in ~hpc/tests/mpi/pgi+ib
  • Run (some) of the tests, one at a time:
hydra% source hello-csh.qsub
hydra% qsub -pe mpich_ib 4 hello-csh-opts.job

hydra% source hello-sh.qsub
hydra% qsub -pe mpich_ib 4 hello-sh-opts.job
  • Use qstat to monitor the job, then look at the corresponding .log files and compare them to the same files in ~hpc/tests/mpi/pgi/ib

-- SylvainKorzennikHPCAnalyst - 12 Jul 2012

Topic revision: r4 - 2013-10-02 - SylvainKorzennikHPCAnalyst
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2015 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback