- 2019 CUSR [Prefetch] Evaluation of Hardware Data Prefetchers on Server Processors
- 2016 CUSR [Prefetch] A Survey of Recent Prefetching Techniques for Processor Caches
- [Prefetch] Register File Prefetching
- täkō: A Polymorphic Cache Hierarchy for General-Purpose Optimization of Data Movement
- [Prefetch] Page Size Aware Cache Prefetching
- [Prefetch] Berti: An Accurate Local-Delta Data Prefetcher
- [Prefetch] Merging Similar Patterns for Hardware Prefetching
- [Prefetch] CRISP: Critical Slice Prefetching
- Reducing Load Latency with Cache Level Prediction
- [Replacement] TCOR: A Tile Cache with Optimal Replacement
- Zero Inclusion Victim: Isolating Core Caches from Inclusive Last-Level Cache Evictions
- [Prefetch] Exploiting Page Table Locality for Agile TLB Prefetching
- [Prefetch] A Cost-Effective Entangling Prefetcher for Instructions
- [Profiling, Replacement] Ripple: Profile-Guided Instruction Cache Replacement for Data Center Applications
- [Prefetch] Morrigan: A Composite Instruction TLB Prefetcher
- [Prefetch] Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning
- ITSLF: Inter-Thread Store-to-Load Forwarding in Simultaneous Multithreading
- Fat Loads: Exploiting Locality Amongst Contemporaneous Load Operations to Optimize Cache Accesses
- Criticality Driven Fetch
- [Profiling] TIP: Time-Proportional Instruction Profiling
- [Compression] BCD Deduplication: Effective Memory Compression Using Partial Cache-Line Deduplication
- [Prefetch] A Hierarchical Neural Model of Data Prefetching
- [Replacement] Designing a Cost-Effective Cache ReplacementPolicy using Machine Learning
- [Replacement] P-OPT: Practical Optimal Cache Replacement for Graph Analytics
- [Prefetch] Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design
- Stream Floating: Enabling Proactive and Decentralized Cache Optimizations
- Dead Page and Dead Block Predictors: Cleaning TLBs and Caches Together
- [Prefetch] Divide and Conquer Frontend Bottleneck
- [Prefetch] Bouquet of Instruction Pointers: Instruction Pointer Classifier-Based Spatial Hardware Prefetching
- [Prefetch] I-SPY: Context-Driven Conditional Instruction Prefetching with Coalescing
- [Prefetch] RnR: A Software-Assisted Record-and-Replay Hardware Prefetcher
- Boosting Store Buffer Efficiency with Store-Prefetch Bursts
- Improving the Utilization of Micro-Operation Caches in x86 Processors
- [Prefetch] Classifying Memory Access Patterns for Prefetching
- [Compression] Thesaurus: Efficient Cache Compression via Dynamic Clustering
- [Prefetch] Perceptron-Based Prefetch Filtering
- [Prefetch] Efficient Meta-Data Management for Irregular Data Prefetching
- Duality Caches for Data Parallel Acceleration
- Filter Caching for Free: The Untapped Potential of the Store Buffer
- [Replacement] Applying Deep Learning to the Cache Replacement Problem
- [Compression] Touche: Towards Ideal and Efficient Cache Compression by Mitigating Tag Area Overheads
- [Prefetch] DSPatch: Dual Spatial Pattern Prefetcher
- [Prefetch] Temporal Prefetching Without the Off-Chip Metadata
- [Prefetch] Prefetched Address Translation
- [Prefetch] Bingo Spatial Data Prefetcher
- [Prefetch] Division of Labor: A More Effective Approach to Prefetching
- [Prefetch] Criticality Aware Tiered Cache Hierarchy: A fundamental relook at multi-level cache hierarchies
- [Prefetch] Rethinking Belady's Algorithm to Accommodate Prefetching
- [Replacement] Exploring Predictive Replacement Policies for Instruction Cache and Branch Target Buffer
- SEESAW: Using Superpages to Improve VIPT Caches