Techniques such as out-of-order issue and speculative execution aggressively exploit instruction level parallelism in modem superscalar processor architectures. The front end of such pipelined machines is concerned with providing a stream of schedulable instructions at a bandwidth that meets or exceeds the rate of instructions being issued and executed. As superscalar machines become increasingly wide, it is inevitable that the large set of instructions to be fetched every cycle will span multiple noncontiguous basic blocks. The mechanism to fetch, align, and pass this set of instructions down the pipeline must do so as efficiendy as possible, occupying a minimal number of pipeline cycles. The concept of trace cache has emerged as the most promising technique to meet this high-bandwidth, low-latency fetch requirement. This thesis presents the design, simulation and analysis of a microarchitecture simulator extension that incorporates trace cache. A new fill unit scheme, the Sliding Window Fill Mechanism is proposed. This method exploits trace continuity and identifies probable start regions to improve trace cache hit rate. A 7% hit rate increase was observed over the Rotenberg fill mechanism. Combined with branch promotion, trace cache hit rates experienced a 19% average increase along with a 17% average rise in fetch bandwidth.
Library of Congress Subject Headings
Cache memory; Microprocessors--Design and construction; Computer architecture
Department, Program, or Center
Computer Engineering (KGCOE)
Mulrane, Edward Jr, "An Investigation of trace cache organizations for superscalar processors" (2002). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus