Newsgroups: comp.parallel.mpi
From: "Thomas Fürle" <fuerle@vipios.pri.univie.ac.at>
Subject: Re: MPICH ERROR:more slaves than msg queues
Organization: Vienna University, Austria
Date: Wed, 04 Mar 1998 14:02:05 +0100
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <34FD50CD.FA0784B3@vipios.pri.univie.ac.at>

Anand Pillai ('Aa-nun-dh pill-'I) wrote:

> I was trying to run the example program pi3 on multiple Sun sparc
> workstations using a p4pg file.
>
> MPI: MPICH
> OS: Solaris
> Device: ch_p4
>
> When the p4pg file "machines" was:
>
> super1 0 /scr/anand/examples/pi3
> super2 1 /scr/anand/examples/pi3
>
> "mpirun -p4pg machines pi3" worked fine.
>
> But when I increased number of processes:
>
> super1 0 /scr/anand/examples/pi3
> super2 2 /scr/anand/examples/pi3
>
> "mpirun -p4pg machines pi3" produced:
>
> rm_19833:  p4_error: create_rm_processes: more slaves than msg queues
> : 2
> p0_2050:  p4_error: net_recv recv:  EOF on socket: 640
> bm_list_2051:  p4_error: interrupt SIGINT: 2
> P4 procgroup file is machines.
>
> How do I correct this problem?
>
> TIA
> Anand


I tried out the same on a Linux Cluster, and I got the same error
message.

If I understand the mpich-manual correct, to use 2 or more processes you
need a SMP-Machine like SGI with mpich compiled with the option
-comm=shared.

In your case you have just repeat the line in your p4pg-File for more
processes.

Instead of using the following for a cluster

super1 0 /scr/anand/examples/pi3
super2 2 /scr/anand/examples/pi3

you need

super1 0 /scr/anand/examples/pi3
super2 1 /scr/anand/examples/pi3
super2 1 /scr/anand/examples/pi3

then everything should work.

Best Regards, Tom
--
Thomas Fuerle
mailto:fuerle@vipios.pri.univie.ac.at
http://vipios.pri.univie.ac.at
Institute for Applied Computer Science and Information Systems
University of  Vienna, Rathausstr. 19/4, A-1010 Vienna, Austria
Tel: +43 1 4277 38423


