From: Blas Pedro Uberuaga <buber@u.washington.edu>
Newsgroups: comp.parallel.mpi
Subject: MPI and LoadLeveler
Date: Tue, 16 Feb 1999 17:46:15 -0800
Organization: University of Washington
Message-Id: <36CA1F67.CB01C6DE@u.washington.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Xref: ukc comp.parallel.mpi:4641


Hi,

I'm trying to run an MPI program on a cluster of IBM 43P-260s with
LoadLeveler.  I'm having a problem with trying to get both MPI and
LoadLeveler to recognize the same machines.  My script for LoadLeveler
is the following:

#@ error=ll_err
#@ output=ll_out
#@ notification=always
#@ class=Parallel
#@ job_type=parallel
#@ min_processors   =    4
# max_processors   =    4
# max_node = 2
#@ queue

echo $LOADL_PROCESSOR_LIST >! hosts
time /usr/local/mpich/bin/mpirun -t -np 4 -machinefile hosts 
/work1/buber/MPICH/Vasp/vasp_mpi

And I get this output:

Procgroup file:
burdina02 0 /work1/buber/MPICH/Vasp/vasp_mpi
burdina02.chem.washington.edu 1 /work1/buber/MPICH/Vasp/vasp_mpi
burdina04.chem.washington.edu 1 /work1/buber/MPICH/Vasp/vasp_mpi
burdina05.chem.washington.edu 1 /work1/buber/MPICH/Vasp/vasp_mpi
/work1/buber/MPICH/Vasp/vasp_mpi -p4pg /work1/buber/MPICH/Vasp/PI7788
-p4wd /work1/buber/MPICH/Vasp
0.130u 0.430s 0:01.15 48.6%     71+68k 0+0io 3pf+0w

The problem is that MPI is trying to run two jobs on the first machine
in the list created in the file hosts.  I tried putting the -nolocal
flag, but then MPI complained about not enough processors.  The hosts
file in this case was:

burdina02.chem.washington.edu burdina04.chem.washington.edu
burdina05.chem.washington.edu burdina01.chem.washington.edu

Thank you for any suggestions.

