This directory contains the source files for the simulator that was
used to generate the results in the paper "Limits of Control Flow on
Parallelism" from ISCA'92.  There have been a few extensions and small
changes since that paper was written.  Most notably, I've added an option
to include some of the potential data dependences that were mentioned
briefly in the paper.  (In the source code, these are referred to as
indirect control dependences.)  As explained in the paper, we cannot
include all of the potential dependences, but even the ones that we can
analyze have a significant impact in some cases.  Another significant
change is the removal of load/store bypassing so that store instructions
have the same one cycle latency as all other operations.  This seems to
reduce the speedups by a small constant factor.

The simulator has only been run on DECstations but it could possibly be
ported to other machines using the MIPS architecture.  It can be compiled
with g++ version 2.4.5.  I think all of the include files are standard with
Ultrix.  The simulator code is reasonably well-documented.  There are two
separate programs:  the main simulator (dsim) and a preprocessor (presim).
The preprocessor performs static analysis to handle loop unrolling and
potential data dependences.  The results are saved in a file ending
with ".dat".  The main simulator reads this file when it first starts up.
The reason for this split is that the preprocessing takes quite a long
time for big programs, so we don't want to repeat it for every simulation.
The simulator has evolved from Mike Smith's xsim program, which is avail-
able (along with an excellent document describing pixie) by anonymous ftp
from velox.stanford.edu in pub/pixie_doc.  You might want to look at that
if you're confused about the basic process of using pixie traces.

To use the simulator, you must first compile the benchmark and get the
necessary benchmark inputs.  Create a file called "parameters" containing
the command lines parameters to run the benchmark (e.g. "in_file > out").
Then run the simsetup script with the command line arguments to run the
benchmark (for profiling).  After creating the pixie files and preprocessor
output needed by the simulator, you can use the go_all and go scripts to
run the simulator.  Have fun!

--Bob Wilson (bwilson@shasta.stanford.edu)


Version 1.1
	- fixed several bugs

Version 1.2  8/21/92
	- added looplev and procht options to help identify where the
	  speedups are coming from.  The looplevN option causes only the
	  N innermost loops to be ``parallelized''.  Similarly, the
	  prochtN options only finds parallelism in procedures with a
	  height of N or less in the call graph (recursive cycles are
	  just ignored).
	- added code to compensate for the MIPS assembler's code scheduling.
	  There is a serious problem when the assembler moves an instruction
	  from a branch target into the branch delay slot and then changes
	  the branch to jump to the instruction following the original target.
	  This creates irreducible loops in the flow graph, basic blocks with
	  no predecessors, and non-essential control dependences.
	- added the nodisambig option to disable perfect memory disambiguation.
	  Instead only references from the stack and global pointers can be
	  disambiguated.

Version 1.3 8/11/93
	- changed presim to add flow graph edges to make all nodes reachable.
	  This is needed to handle procedures with apparently infinite loops.