From: Bernhard Scholz <scholz@par.univie.ac.at>
Newsgroups: comp.parallel.mpi
Subject: Re: Log file frustration in MPICH
Date: Tue, 22 Sep 1998 11:04:14 +0200
Organization: Vienna University, Austria
Message-Id: <3607680E.5E8F@par.univie.ac.at>
References: <35F6F329.58B4@rutcor.rutgers.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Jonathan Eckstein wrote:

> 
> After much fiddling, I got "Jumpshot" to run, but it won't read the
> log files.  I think it's expecting "clog" format, while I have an
> "alog" file.  There are "clog" routines in MPE, but there is there an
> example of their use?  Even if I could write a clog, I'm not sure of
> success because Jumpshot doesn't even seem to read the clog files it
> comes with.  Even for those files, It keeps saying "unknown record
> type 384763863, perhaps the file is little endian", or something
> roughly like that.
> 

At the moment I am writing an instrumentation system (SIS) for VFC.
(HPF-Fortran compiler system). The instrumentation calls are based on
MPE. Unfortunately, MPE Vers. 1.1.1/1.1.0 does not work  properly. I
scanned the code and I found the problem in mpe/clog_merge.c. Here, an
#if directive in line 243 states the endian (big/little) of the machine.
If it is big endian, the internal datastructure is converted to little
endian. The reason for this conversion is that MPI should work in an
heterogeneous system. (Everything, should be little endian.)
Nevertheless, the routine mpe/clog2alog.c does not the conversion from
little endian to big_endian (if you have got a big endian machine).
Therefore, it reads rubbish and this is the reason for 38476386.

Anyway, if you set the #if - directive to zero (mpe/clog_merge.c line
243) you can avoid the conversion and the thing will only work in an
homogenous system (I think that's the normal case). Additionally, you
have to move the declaration of "int rc;" to the function head.
(Otherwise you cannot compile it.)

If you do that you will discover that the communication protocol doesn't
work properly. 
  1) It will run with one processor
  2) It will run with two processors
  3) It won't run with more than two processors
  
Well, don't ask me why. The reason is, that the traces of each processor 
are sorted with merge sort. I need further investigations..........

Bernhard.


                       Bernhard Scholz | Research Assistant
     Institute for Software Technology | scholz@par.univie.ac.at
                  and Parallel Systems | Tel: ++43 1-3105608-76
Liechtensteinstrasse 22, A-1090 Vienna | Fax: ++43 1-3105608-88

