Newsgroups: comp.parallel.mpi From: engrbohn@aol.com (Engr Bohn) Subject: Re: MPICH on Red Hat Linux, peculiar behaviour Organization: AOL http://www.aol.com Date: 07 Jun 1998 14:02:21 GMT Message-ID: <1998060714022100.KAA21811@ladder03.news.aol.com> Oh, timing is everything. Just a couple days ago, I asked a similar question on the Beowulf mail list. "Umesh Kumar V. Rajasekaran" wrote: > I am using MPICH with ch_p4 on a Red Hat Linux ersion 2.0.33 (gcc >version >2.7.2.3). [...] >I wrote a simple ping application [...] >I observed a peculiar behaviour. For over 5 MPI_Send calls, 4 of them are >taking a >time of about 200-300 microseconds. But the fifth one is taking around >25000 >microseconds. > >Also another behaviour was that, for message sizes 128, 256, 512, the time >of >MPI_Send call is increasing, but after that, for 1024, it suddenly dips >around 1/4th >of the other time. and then starts increasing for sizes 2048, 4096. > >can someone explain this behaviour. And, from the Beowulf mail list, my alter-ego, Capt Bohn, Christopher A. [cbohn@afit.af.mil] wrote: >I'm running comparisons of a network of Sun Sparcs connected >via switched ethernet hub and via myrinet vs four Pentium IIs >networked via a fast ethernet repeater. The throughput results >of the "ring" routine in the systest.c program distributed with >MPICH are interesting. [...] >For the PoPC, the throughput is banding into two tightly >clustered curves (I'm sometimes tempted to see a third or fourth >curve between the two dominant curves, but most of the data >points cluster into two deterministic bands. The lower band is >about 10% the throughput of the upper band until 2-4KB, at >which point the two bands start converging. At 64-128KB, the >two bands have converged to such that the lower is about 75% of >the upper; by 1MB, the two bands are indistinguishable, with a >throughput of 75-80Mbps. These results are repeatable for 2, 3, >or 4 processors -- so far I've done three different session each of >twenty runs, on three separate days. For all runs, the system was >pristine (I was the only user, XWindows not running). >I'm hypothesizing this phenomenon may be due to an occasional >residual carrier on the repeater making the network cards wait, >thus reducing the effective throughput. I got a couple replies, but I think this is the one that helps us both... Troy Benjegerdes [hozer@drgw.net] wrote: >What kernel are you running? 2.0.33 has some kind of problem with memory >management that kills of 100base-T performance in a range that I think is >close to the area you are having problems in. >Also, try the 'NetPIPE' network bandwith benchmark.... >http://www.scl.ameslab.gov/netpipe/ >I have no idea how the test you are running works, but I suspect that if >you can get ~80Mbps, the repeater is not your problem. Take care, cb Christopher A. Bohn EngrBohn@aol.com "Oooh! What does THIS button do!?"