Tags:
create new tag
, view all tags

How to submit a MPI job, using PGI, over TCP/IP

  • This primer describes how to submit parallel MPI jobs, using program compiled and linked with the PGI compilers, over TCP/IP (MPICH)
  • The executable(s) MUST be compiled and linked using PGI and MPICH, (not MVAPICH)
  • The primer on compilers describe how to compile and link using PGI and MPICH.
  • An executable compiled w/ an other compiler (GNU, OpenMPI, Intel, etc) should not be submitted this way. It may run, but is likely to give you grief.

  • There is a different primer that explains how to submit PGI/MPI (MVAPICH) jobs to access the IB fabric.

To Submit a Job

  • The basics on how to submit a job are described in the primer's introduction on job submission, so read that one first.
  • The job file include the command to launch your OpenMPI program, using the mpirun command.
  • The number of processors and the machinefile (list of hosts to use) is not explicitly specified (hardwired) with the mpirun command.
  • You must invoke the corresponding mpirun (PGI/MPICH implementation), see example below.
  • The qsub file (or command)
    • will request a number of processors (CPUs, cores)
    • and specify the corresponding PE (parallel environment), via the qsub command
  • The job scheduler will grant the request and determine the hosts list for that specific job (i.e., the machinefile)
  • The jobs file can specify the PE and the number of processors via an embedded directive (i.e., #$ -pe mpich 8)

Required Configuration

You must edit your ~/.cshrc and/or ~/.bashrc to define the variable PGI
This is usually (but not always) set up when setting up your environment for the PGI compilers.
To check, type echo $PGI, it should return:
/share/apps/pgi
  • Failure to define this variable will cause your jobs to fail, and will produce errors and aggravation.
  • If this variable is not set
    • either
      [t]csh users [ba]sh users
      edit your ~/.cshrc: edit your ~/.bashrc:
      setenv PGI   /share/apps/pgi export  PGI=/share/apps/pgi
    • or do one of the following:
      specify -v PGI=/share/apps/pgi with qsub
      add #$ -v PGI=/share/apps/pgi in the job file
      add -v PGI=/share/apps/pgi in your ~/.sge_request file
read the man pages on qsub and/or sge_request to learn more.

  • It is convenient to define a second env var that points to the mpich location, specifying explicitly the version to use:
    [t]csh users [ba]sh users
    setenv MPICH $PGI/linux86-64/12.5//mpi/mpich export MPICH=$PGI/linux86-64/12.5//mpi/mpich
    The alternative is to do so in the job files.

Example

A Minimal Job File

  • Let's assume that you want to run the PGI/MPI executable executable mycode
    [t]csh syntax [ba]sh syntax
    hydra% cat mycode-csh.job hydra% cat mycode-sh.job
      . ~/.bashrc
      export PGI
    set MPICH = $PGI/linux86-64/12.5/mpi/mpich MPICH=$PGI/linux86-64/12.5/mpi/mpich
    $MPICH/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines mycode $MPICH/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines mycode
  • Note that if you define the env var MPICH via your ~/.cshrc or ~/.bashrc, you don't need to do it in the job file.
  • Note that the env vars NSLOTS and TMPDIR are not defined in the job file,
  • These variables will be set by the Grid Engine at execution time and will hold the number of granted slots and the list of hosts (machines) to use.

A Minimal qsub File

  • The corresponding qsub file is
    [t]csh syntax [ba]sh syntax
    hydra% cat mycode-csh.qsub hydra% cat mycode-sh.qsub
    qsub -pe mpich 8 \ qsub -pe mpich 8 \
    -cwd -j y \ -cwd -j y \
    -N mycode \ -N mycode \
    -o mycode.log \ -o mycode.log \
    mycode-csh.job -S /bin/sh \
      mycode-sh.job

NOTE

  • The above example requests 8 processors (CPUs, cores); adjust that number to your needs.
  • The flag -pe mpich 8 is the flag that tells SGE to use the parallel environment (PE) mpich and requests 8 processors.
    • MPICH is the PE to use with and only with executables linked with the mpich library.
    • The -pe mpich 8 flag can be embedded (using #$) in the job file, like any other one,
    • options passed to qsub override embedded directives in the job file (including -pe)

Details for Experienced Users

  • At run-time, the scheduler defines the following MPICH specific variables:
    NSLOTS The granted number of slots, or number of processors for this MPI run
    TMPDIR The temporary directory, where the machinefile will be written,
      the file that holds the distribution of processors over the compute nodes

  • Hence you can use, in the jobs file, the commands
    echo number of slots is $NSLOTS to print out the granted value of NSLOTS
    echo machine file is $TMPDIR/machines  
    cat $TMPDIR/machines to print out the name and content of the machinefile

More Examples

  • Look, on hydra, in ~hpc/tests/mpi/pgi for some examples.
  • To try them, create a test directory and extract the compressed tar ball:
hydra% mkdir -p ~/tests/mpi/pgi/tcp
hydra% cd ~/tests/mpi/pgi/tcp
hydra% tar xvzf ~hpc/tests/mpi/pgi/tcp/tests.tgz

  • Build the executable (assuming that your environment is set up correctly for the PGI compilers.)
hydra% make

  • Two job files allow you to test if you properly define both env vars PGI and MPICH:
    [t]csh users [ba]sh users
    hydra% qsub test-csh-env.job hydra% qsub test-sh-env.job
    look at the corresponding .log files and compare them to the same files in ~hpc/tests/mpi/pgi
  • Run (some) of the tests, one at a time:
hydra% source hello-csh.qsub
hydra% qsub -pe mpich 4 hello-csh-opts.job

hydra% source hello-sh.qsub
hydra% qsub -pe mpich 4 hello-sh-opts.job
  • Use qstat to monitor the job, then look at the corresponding .log files and compare them to the same files in ~hpc/tests/mpi/pgi

-- SylvainKorzennikHPCAnalyst - 20 Jan 2012

Topic revision: r8 - 2012-07-13 - SylvainKorzennikHPCAnalyst
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2015 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback