Tags:
create new tag
, view all tags

How to submit a MPI job, using Intel, over TCP/IP or IB

  • This primer describes how to submit parallel MPI jobs, using program compiled and linked with the Intel compilers, over TPC/IP or over IB.
  • The executable(s) MUST be compiled and linked using the Intel compilers.
  • The primer on compilers describe how to compile and link using the Intel compilers.
  • An executable compiled w/ an other compiler (PGI, OpenMPI, GNU, etc) should not be submitted this way. It may run, but is likely to give you grief.

To Submit a Job

  • The basics on how to submit a job are described in the primer's introduction on job submission, so read that one first.
  • The job file include the command to launch your OpenMPI program, using the mpirun command.
  • The number of processors and the machinefile (list of hosts to use) is not explicitly specified (hardwired) with the mpirun command.
  • You must invoke the corresponding mpirun (GNU MPI implementation, not ORTE), see example below.
  • The qsub file (or command)
    • will request a number of processors (CPUs, cores)
    • and specify the corresponding PE (parallel environment), via the qsub command
  • The job scheduler will grant the request and determine the hosts list for that specific job (i.e., the machinefile)

  • By contrast to the other compilers, executables built with the Intel compiler can use any type of message passing fabric,
  • the fabric to use is specified at run-time, via the environment variable I_MPI_FABRICS.

Example, using TCP/IP

A Minimal Job File

  • Let's assume that you want to run the MPI (Intel) executable mycode
    [t]csh syntax [ba]sh syntax
    hydra% cat mycode-csh.job hydra% cat mycode-sh.job
    setenv  OMPI_MCA_plm_rsh_disable_qrsh 1 export OMPI_MCA_plm_rsh_disable_qrsh=1
    setenv I_MPI_FABRICS "shm:tcp" export  I_MPI_FABRICS="shm:tcp"
    set MPICH = /software/intel/impi/4.0.3.008 MPICH=/software/intel/impi/4.0.3.008
    $MPICH/bin64/mpirun -np $NSLOTS ./mycode $MPICH/bin64/mpirun -np $NSLOTS ./mycode
  • Note that the env var $NSLOTS is not defined in the job file,
  • The variable NSLOTS will be set by the Grid Engine at execution time and holds the number of granted slots.

A Minimal qsub File

  • The corresponding qsub file is
    [t]csh syntax [ba]sh syntax
    hydra% cat mycode-csh.qsub hydra% cat mycode-sh.qsub
    qsub -pe orte 8 \ qsub -pe orte 8 \
    -cwd -j y \ -cwd -j y \
    -N mycode \ -N mycode \
    -o mycode.log \ -o mycode.log \
    mycode-csh.job -S /bin/sh \
      mycode-sh.job

Changes needed to run over IB

  • You will need to change the value of I_MPI_FABRICS in your job file, as follow:
    [t]csh syntax [ba]sh syntax
    setenv I_MPI_FABRICS "shm:ofa" export  I_MPI_FABRICS="shm:ofa"

  • You will need to specify the orte_ib, instead of orte, PE,

  • If you specify a queue, use the corresponding queues (?TNi.q, instead of ?TN.q).

NOTE

  • The above example requests 8 processors (CPUs, cores); adjust that number to your needs.
  • The flag -pe orte 8 is the flag that tells SGE to use the parallel environment (PE) orte and requests 8 processors.
    • Intel executables must use that PE (not mpich),
    • by specifying orte_ib, instead of orte, your job will run on compute nodes connected to the IB fabric,
    • but the fabric used for message passing is still set by the value of I_MPI_FABRICS in your job file,
    • the -pe orte 8 flag can be embedded (using #$) in the job file, like any other one

  • An other page describes in more detail the available queues, in summary
    to run MPI/Intel over TCP/IP MPI/Intel over IB
    you must use -pe orte N -pe orte_ib N
    you must set I_MPI_FABRICS to "shm:tcp" "shm:ofa"
    you can use the queues sTN.q, mTN.q or lTN.q sTNi.q, mTNi.q or lTNi.q.
    The queues correspond to the short, medium and long execution time respectively, (the queue is specified with the -q flag).

  • Options passed to qsub override embedded directives in the job file (including -pe or -q)

  • You can set the environment variable I_MPI_FABRICS in several ways:
    1. set it in the job file, as
      [t]csh users [ba]sh users
      setenv I_MPI_FABRICS "shm:ofa" export  I_MPI_FABRICS="shm:ofa"
    2. set it in your default environment
      [t]csh users [ba]sh users
      edit your ~/.cshrc: edit your ~/.bashrc:
      setenv I_MPI_FABRICS "shm:ofa" export  I_MPI_FABRICS="shm:ofa"
    3. or do one of the following:
      specify -v I_MPI_FABRICS=shm:ofa with qsub
      add #$ -v I_MPI_FABRICS=shm:ofa in the job file
      add -v I_MPI_FABRICS=shm:ofa in your ~/.sge_request file
read the man pages on qsub and/or sge_request to learn more.

Details for Experienced Users

  • At run-time, the scheduler defines the following OpenMPI specific variables:
    NSLOTS The granted number of slots, or number of processors for this MPI run
    PE_HOSTFILE The file name that lists the distribution of processors over the compute nodes

  • Hence you can use, in the jobs file, the commands
    echo number of slots is $NSLOTS to print out the granted value of NSLOTS
    echo pe host file is $PE_HOSTFILE  
    cat $PE_HOSTFILE to print out the name and content of the PE_HOSTFILE

  • You can check what message passing fabric your job is using with
    echo MPI fabric is set to $I_MPI_FABRICS

More Examples

Look, on hydra, in ~hpc/tests/mpi/intel for some examples.

  • To execute them, create a test directory and extract the compressed tar-ball:
hydra% mkdir -p ~/tests/mpi/intel
hydra% cd ~/tests/mpi/intel
hydra% tar xvzf ~hpc/tests/mpi/intel/tests.tgz

  • Build the executable
hydra% make

  • Run (some) of the tests
hydra% source hello-csh.qsub
hydra% source hello-sh.qsub

hydra% source hello-csh_ib.qsub
hydra% source hello-sh_ib.qsub

hydra% qsub -pe orte 4 hello-csh-opts.job
hydra% qsub -pe orte 4 hello-csh-opts_tcp.job

hydra% qsub -pe orte_ib 4 hello-csh-opts_ib.job

Run only one job at a time, use qstat to monitor the job, then look at the hello.log, hello_tcp.log and hello_ib.log files.

-- SylvainKorzennikHPCAnalyst - 12 Jul 2012

Topic revision: r3 - 2012-07-13 - SylvainKorzennikHPCAnalyst
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2015 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback