Newsgroups: comp.parallel.pvm From: jens@lily.newcastle.edu.au (Jens Utech) Subject: Q: Strange processes on PVM Keywords: pvm, bug, ghostprocess, master/slave Organization: The University of Newcastle Date: 07 May 1997 05:35:55 GMT Message-ID: Hello everybody, I encountered a PVM problem which is really strange to me: I am working on a master/slave system for parallel genetic algorithms. The master process has to start a slave with the pvm_spawn command: int tid; result = pvm_spawn("/.../ga", NULL, PvmTaskDefault, NULL, 1, &tid); The result of this call is 1 and the variable tid is 40003. If I now start the console and do "ps -al" I see the following: HOST TID PTID PID FLAG 0x COMMAND lily 40002 - 29395 4/c - lily 40003 40002 29396 16/o,c,f /home/grad/jens/diplom/ga/ga lily 40004 - 29396 4/c - Process 40002 is the master process itself. My first question is: What is process 40004 ? (note that it has the same Unix pid as the slave itself, Furthermore if i do either "kill 40003" or "kill 40004" both processes disappear) After the slave process has initialized it calls pvm_recv(-1,-1) (I checked with gdb that the process really reaches this point). After that the master wants to send a message to the slave with pvm_send(tid,10) (10 is my message tag) The result of this call is PvmOk. I also checked that tid is still 40003 (= the result from pvm_spawn) The problem is that this message never arrives in the slave, which means that the slave never returns from his pvm_recv(-1,-1) After that I tried to manually set tid to 40004 (= tid of the "ghost" process) and called pvm_send(tid,10) again. To my surprise, this time the message arrived in the slave, pvm_recv(-1,-1) returned and the slave did what I expected it to do. Has anybody an explanation for this strange behaviour ? Please note that I tested this on a 4 CPU (Sparc) Sun running Solaris 2.5 (no other hosts in the virtual machine) I tried this using _both_ SUNMP and the SUN4SOL2 architecture and in both cases the same thing happened. I would greatly appreciate any help, because I am really stuck at this point and I don't know what to do to solve that problem. Bye Jens jens@cs.newcastle.edu.au -- ------------------------------------------------------------------------- Jens Utech jens@cs.newcastle.edu.au .-_|\ Dept of Computer Science SnailMail: 115 Acacia Av / \ The University of Newcastle Lambton Nth, NSW 2299 \.--._* Callaghan NSW 2308 Phone: (049) 56 20 91 v Australia