From: hjstein@bfr.co.il (Harvey J. Stein)
Newsgroups: comp.parallel.pvm
Subject: PVM synchronization failures?
Date: 19 Mar 1999 16:21:29 +0300
Organization: Unspecified Organization
Message-Id: <m23e31ty1y.fsf@blinky.bfr.co.il>
Cc: hjstein@bfr.co.il
Xref: ukc comp.parallel.pvm:8154


Has anyone ever seen PVM synchronization failures?  I've just observed
the following strange behavior:

1. pvm started on machine a001 & adds a001-a090.
2. pvm on a001 (conf cmd) reports 90 hosts available.
3. pvm on a002 (conf cmd) reports 89 hosts available - it doesn't
   list a003.
4. pvm on some other machines also reports 89 hosts & skips a003 (I
   didn't check all machines).
5. pvm on a003 lists 47 machines - almost half missing.

This is really the case, because using pvm_spawn to get on slave on
each host on the cluster gets 89 slaves on a002, but only 47 on
al003.

How can this happen?  How can two slave pvm daemons disagree about how
many machines are in the PVM virtual machine?

--
Harvey J. Stein
BFM Financial Research
hjstein@bfr.co.il

