This directory contains the source files for the simulator that was used to generate the results in the paper "Limits of Control Flow on Parallelism" from ISCA'92. There have been a few extensions and small changes since that paper was written. Most notably, I've added an option to include some of the potential data dependences that were mentioned briefly in the paper. (In the source code, these are referred to as indirect control dependences.) As explained in the paper, we cannot include all of the potential dependences, but even the ones that we can analyze have a significant impact in some cases. Another significant change is the removal of load/store bypassing so that store instructions have the same one cycle latency as all other operations. This seems to reduce the speedups by a small constant factor. The simulator has only been run on DECstations but it could possibly be ported to other machines using the MIPS architecture. It can be compiled with g++ version 2.4.5. I think all of the include files are standard with Ultrix. The simulator code is reasonably well-documented. There are two separate programs: the main simulator (dsim) and a preprocessor (presim). The preprocessor performs static analysis to handle loop unrolling and potential data dependences. The results are saved in a file ending with ".dat". The main simulator reads this file when it first starts up. The reason for this split is that the preprocessing takes quite a long time for big programs, so we don't want to repeat it for every simulation. The simulator has evolved from Mike Smith's xsim program, which is avail- able (along with an excellent document describing pixie) by anonymous ftp from velox.stanford.edu in pub/pixie_doc. You might want to look at that if you're confused about the basic process of using pixie traces. To use the simulator, you must first compile the benchmark and get the necessary benchmark inputs. Create a file called "parameters" containing the command lines parameters to run the benchmark (e.g. "in_file > out"). Then run the simsetup script with the command line arguments to run the benchmark (for profiling). After creating the pixie files and preprocessor output needed by the simulator, you can use the go_all and go scripts to run the simulator. Have fun! --Bob Wilson (bwilson@shasta.stanford.edu) Version 1.1 - fixed several bugs Version 1.2 8/21/92 - added looplev and procht options to help identify where the speedups are coming from. The looplevN option causes only the N innermost loops to be ``parallelized''. Similarly, the prochtN options only finds parallelism in procedures with a height of N or less in the call graph (recursive cycles are just ignored). - added code to compensate for the MIPS assembler's code scheduling. There is a serious problem when the assembler moves an instruction from a branch target into the branch delay slot and then changes the branch to jump to the instruction following the original target. This creates irreducible loops in the flow graph, basic blocks with no predecessors, and non-essential control dependences. - added the nodisambig option to disable perfect memory disambiguation. Instead only references from the stack and global pointers can be disambiguated. Version 1.3 8/11/93 - changed presim to add flow graph edges to make all nodes reachable. This is needed to handle procedures with apparently infinite loops.