From: Vasiliu Bogdan <bogdan@scl.ameslab.gov>
Newsgroups: comp.parallel.mpi
Subject: Pentium related problem
Date: Thu, 10 Sep 1998 16:37:12 -0500
Organization: Iowa State University, Ames, Iowa, USA
Message-Id: <35F84687.69A9365@scl.ameslab.gov>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


I work on parallelizing Finite Difference Method codes and
I have run into a problem on my cluster of Pentium computers
running Linux 2.0.35.

I send a slice of a 3D array from one processor
to another. In order to do that I have to pack the data into a
contiguous buffer, send it, and then unpack it at the destination
process. The packing process takes data from noncontiguous
memory locations and puts it into a contiguous memory
area (the buffer to be sent). The unpacking process takes the
data from a contiguous memory area (the received buffer)
and puts it into noncontiguous memory (back into the 3d matrix).

The strange thing is that the unpacking is one order
of magnitude slower than the packing.  I do not understand
why the unpacking process (which is the inverse of the packing)
is 10 times slower than the packing.
I wrote a small C test program (without any communication calls) and
I did some tests on SGI, HP, and Pentium computers. Only the Pentiums
had this problem  (this process of packing and unpacking data occurs
in any parallel FDM algorithm).
Does anyone know the answer to this problem?


Below I have attached the portions of the code that do the packing
and unpacking.


PACKING:
...
                int offset = 0;
                int one = 1;
                double *temp;
                double *mat = (double*)matrix;
                temp = (double*)malloc (columns*pages * sizeof(double));

                for ( k = 0; k < pages; k++)
                {
/*                 for ( i = 0; i < columns ; i++ )
                  temp[offset+i] = mat[index+i*rows+columns*rows*k];
*/
                dcopy_(&columns, &mat[index+columns*rows*k], &rows,
&temp[offset], &one );
                  offset += columns;
                }
...

UNPACKING:
...
                  int offset = 0;
                  int one = 1;
                  double *mat = (double*)matrix;
                  double *tmp = (double*)temp;

                 for ( k = 0; k < pages; k++)
                 {
/*                  for ( i = 0; i < columns; i++ )
                      mat[index+i*rows+columns*rows*k] = tmp[i+offset];
*/
                  dcopy_(&columns, &tmp[offset], &one,
&mat[index+columns*rows*k], &rows);
                  offset +=columns;
                 }
...

Bogdan


--
 Bogdan Vasiliu                 Research Assistant - Ames Lab - ISU
 304 Wilhelm Hall                            bogdan@scl.ameslab.gov
 Iowa State University                         Phone:(515)-294-7336
 Ames, IA  50011

