Newsgroups: comp.parallel.mpi
From: "Andrew Mc.Ghee" <mcghee@sun.mech.uq.edu.au>
Subject: Installing MPI - multiple networks?
Organization: Mechanical Engineering, UQ
Date: Wed, 05 Mar 1997 15:20:08 +1000
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <331D0288.4110@sun.mech.uq.edu.au>

We are presently trying to install MPI onto a DEC machine that is
connected between two networks - A standard 10MB/s world ethernet, and a
private 100MB/s network connecting the parallel machines.

Our present problem is this, the DEC machine set up with two network
cards is as follows;

apstar2.mech.uq.edu.au (10MB/s ethernet)
nova2.mech.uq.edu.au   (100Mb/s private network for parallel machines)

We have a problem running mpi under this setting;
When machines.alpha reflect the machine name apstar2.mech.uq.edu.au,
test programs run fine (mpirun -np 4 cpi produces an answer)

However, when we modify machines.alpha to the following;

nova2.mech.uq.edu.au
nova2.mech.uq.edu.au
nova2.mech.uq.edu.au
nova2.mech.uq.edu.au

We get the following error;

rm_3_13418: (0.027344) process not in process table; my_unix_id = 13418
my_host=apstar2.mech.uq.edu.au
rm_3_13418: (0.027344) Probable cause:  local slave on uniprocessor
without shared memory
rm_3_13418: (0.027344) Probable fix:  ensure only one process on
apstar2.mech.uq.edu.au
rm_3_13418: (0.027344) (on master process this means 'local 0' in the
procgroup file)
rm_3_13418: (0.027344) You can also remake p4 with SYSV_IPC set in the
OPTIONS file
rm_3_13418:  p4_error: p4_get_my_id_from_proc: 0
rm_l_3_13412:  p4_error: interrupt SIGINT: 2

However, "mpirun cpi" works fine (Only using one node?)

Could someone enlighten us as to what this problem is?
We are assuming that it is a logistic problem associated with setting up
the names for machines on our private network.

Thanks in advance for any help that anyone can offer on this,

regards,
Andrew Mc.Ghee
Mechanical Engineering, University of Queensland

