From: michael mcnally <mmcnally@lsc.nd.edu>
Newsgroups: comp.parallel.mpi
Subject: Re: LLAMAS: possible bug (fwd)
Date: Sat, 29 May 1999 01:53:31 -0500
Organization: University of Notre Dame
Message-Id: <Pine.SOL.4.10.9905290152010.7733-100000@lsc.nd.edu>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Xref: ukc comp.parallel.mpi:5147


---------- Forwarded message ----------
Date: Tue, 25 May 1999 10:14:16 -0500 (EST)
From: Jeff Squyres <jsquyres@lsc.nd.edu>
Reply-To: llamas@mpi.nd.edu
To: LAM Team <llamas@mpi.nd.edu>, Jaume Catarineu <jaume.catarineu@econ.upf.es>
Subject: Re: LLAMAS: possible bug?

On Tue, 25 May 1999, Jaume Catarineu wrote:

> [initial troubleshooting snipped]
> After reading twice the manual I see that MPILAM tries to execute the
> foreign-nodes-copies of the program in the login directory of these
> nodes. So I supose I thought I'll solve the problem setting the ~/bin
> directory as the user's login directory in the foreign node. When
> executing nothing changed:
> 
>   $ mpirun -np 4 cpi
> mpirun: cannot start cpi on n0 (o): No such file or directory

It looks like LAM is telling you that your program was not found on n0 --
which is the first machine in your hostfile that you lambooted.  This is
typically the machine that you are launching jobs from.

If you changed your home directory in /etc/passwd, your current shell will
usually not know this until you start up a new shell (i.e., re-login, open
a new window in your window manager, etc.).  So if you changed
/etc/passwd, and then used the same shell to invoke mpirun, it probably
still thought that your $HOME was the original value, and hence, couldn't
find cpi.

> Even with the '-D' option I take the same result.

Can you show the command line that you tried with the -D option?  

We typically use an alias such as the following (note that we also assume
a homogenous cluster here -- eliminate the -O option if you wish):

	alias lamrun /path/to/mpirun -O -w N -D `pwd`/!*

Which does the Right Thing, and is probably what you want.  Try this, and
let us know if you still have problems.

> * How can I have my own directories structure and execute my 
>   parallel programs wherever I want without problems?
>   How can I deal with the foreign working directory?

If you have a situation where you have a different directory struture on
remote hosts (or want to execute a different executable), you have to use
a boot schema.  This is simply a file that specifies which executable, and
what options to use on each host (specifically delinineated).  Here's an
example boot schema:

	n0 /home/myid/mpi/bin/master
	n1-n8 /home/myid/mpi/bin/slave

Although you should probably not need this, since you indicated that you
have NFS copies of your directory on the remote nodes.

{+} Jeff Squyres
{+} squyres@cse.nd.edu
{+} Perpetual Obsessive Notre Dame Student Craving Utter Madness
{+} "I came to ND for 4 years and ended up staying for a decade"

-------------------------------------------------------------------------

M. D. McNally

mmcnally@lsc.nd.edu

Home			Office - 1		Office - 2
P. O. Box 836		Fencing Office		LSC - 325 Cushing
Notre Dame, IN 46556	Joyce Center		Computer Science & Engr.
			Notre Dame, In 46556 	Notre Dame, IN 46556

