Seminar: Graduate Seminar
uOp-Based Value Prediction: A New Approach Enabling Efficient Implementation
Instruction-Level Parallelism (ILP) plays a significant role in the design of modern processors. For decades, ILP was considered inherently limited by true data dependencies. The concept of Value prediction (VP) was introduced to speculatively break true data dependencies, thus, allowing Out-of-Order (OoO) processors to achieve higher ILP and gain performance. To the best of our knowledge, VP has not been implemented in commercial chips since its performance gains do not justify its implementation costs. In this work, a new performance-efficient implementation of VP is proposed. It predicts the destination values of instructions using two phases: a training phase where predictions are trained using a value predictor; and a deployment phase, where out of those predictions, it deploys the high-confidence ones. Importantly, in the case of a prediction having high confidence, the training of the corresponding instruction is stopped. Only when an instruction is found to have been predicted incorrectly, the system is moved back to the training phase. In our design, the training phase is not part of the OoO pipeline so the constraints on it can be relaxed. This circumvents stressing the CPU front-end which is generally an issue that VP designs face. In the deployment phase, only high-confidence predictions are employed avoiding costly accesses for all other instructions that are either not eligible for prediction or have low confidence, after which prediction validation is performed. This phase is integrated in the pipeline thus it is designed to be simple. In addition, we enable VP for micro-operation (uOp) granularity. Since instructions’ addresses in a CISC CPU do not stay the same after decoding them into RISC instructions, typical value predictors become invalid for uOp predictions. Our architecture leverages the micro-operation cache to overcome this problem and enable VP for uOps. This paves the way for practical adaptations of VP in high-performance processors. In our experiments, we ran spec2017 benchmarks in addition to EEMBC CoreMark benchmark on sniper x86-simulator. We present results for a 4-issue and an 8-issue superscalar processor augmented with our VP scheme with a stride predictor. The 8-issue processor with VP has, on average, a 2.56% higher instruction-per-cycle (IPC) than its baseline counterpart without VP, on all benchmark tests; the 4-issue processor has, on average, 2.23% higher IPC than its baseline counterpart without VP. To explain low speedup results in the face of high prediction accuracy for specific benchmarks, we conducted dependency distance analyses and examined them to justify the results.
M.Sc. student under the supervision of Prof. Avi Mendlson and Prof. Freddy Gabbay.

