Sunday, October 12 (Tutorials and Workshops) Monday, October 13 Tuesday, October 14 Poster Session (4:10pm Monday)

Sunday, October 12

   
9:00
|
10:20
Tutorial: Observability into Application-level Metrics with eBPFs
Session 1 of 2
10:20
|
10:40
Coffee Break
10:40
|
12:00
Tutorial: Observability into Application-level Metrics with eBPFs
Session 2 of 2
12:00
|
1:20
Lunch
1:20
|
3:00
Workshop: Workshop on Security–Performance Trade-offs
Session 1 of 2
3:00
|
3:20
Coffee Break
3:20
|
5:00
Workshop: Workshop on Security–Performance Trade-offs
Session 2 of 2

Monday, October 13

   
9:00
|
9:15
Welcome
9:15
|
10:15
Keynote Talk: Stop Counting the Crowd, Start Finding Waldo! The answer isn’t ‘more’ — it’s ‘where’!
Amir Yazdanbakhsh (Google DeepMind)
Session chair: Meena Arunachalam (AMD)
Abstract: Modern workloads tempt a single answer: throw more resources at the problem. But resource walls often mask where the real inefficiencies lie. This talk argues for a shift in perspective: the answer to our scaling challenges isn’t just more—it’s where: the rare hot-paths, dataflows, algorithmic choices, and software patterns that dominate cost. This talk demonstrates how AI-driven approaches can identify and mitigate these “Waldo patterns,” and presents recent case studies from our work across the computing stack to identify such patterns and the resulting optimizations. The talk closes with a call to build the next generation of characterization tools, benchmarks, and AI-driven workflows that make finding Waldo repeatable and measurable.
Bio: Amir Yazdanbakhsh is a Research Scientist at Google DeepMind, working at the intersection of machine learning and computer architecture. His primary focus is on applying machine learning to design efficient and sustainable computing systems, from leading the development of large-scale distributed training systems on TPUs to shaping the next generation of Google’s ML accelerators. His work has been recognized by the ISCA Hall of Fame. Notably, his research on using AI to solve performance challenges in hyperscale systems received an IEEE Micro Top Picks award, and his work on a new system for AI won the IEEE Computer Society Best Paper Award. Amir received his Ph.D. from the Georgia Institute of Technology, where he was a recipient of the Microsoft and Qualcomm fellowships.
10:15
|
10:35
Coffee Break
10:35
|
11:50
Session 1: Cross-Domain Methods for Workload Analysis
Session chair: Patrick P. C. Lee (The Chinese University of Hong Kong)
  Belenos: Bottleneck Evaluation to Link Biomechanics to Novel Computing Optimizations
Hana Chitsaz, Johnson Umeike, Amirmahdi Namjoo (University of Maryland, College Park); Babak N. Safa (University of South Florida); Bahar Asgari (University of Maryland, College Park)
  Workload Characterization Using Cross-Layer Features and Multilevel PCA
Lina Sawalha, Grant Deljevic (Western Michigan University)
  Athena: A Plug-and-Play Advisor for Retrieval-Augmented Generation using VectorDB
Ning Liang (Duke University); Fabian Wenz, Jana Giceva (TU Munich); Lisa Wu Wills (Duke University)
  The Fake-Busy and True-Idle Problems of Running Graph Applications on Chiplet-Based Multi-cores
Rashid Aligholipour, Yuan Yao (Uppsala University)
  WANify: Gauging and Balancing Runtime WAN Bandwidth for Geo-distributed Data Analytics
Anshuman Das Mohapatra, Kwangsung Oh (University of Nebraska at Omaha)
12:00
|
1:20
Lunch
1:20
|
2:35
Session 2: Large Language Models
Session chair: Todd Austin (University of Michigan / Agita Labs)
  Understanding Distributed Training of Large Language Models with Unified Virtual Memory
Jane Rhee, Eunbi Jeong (Ewha Womans University); Jiwon Lee (Samsung Electronics); Myung Kuk Yoon (Ewha Womans University)
  Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs
Marcin Chrapek, Marcin Copik, Etienne Mettaz, Torsten Hoefler (ETH Zurich)
  EdgeReasoning: Optimizing Reasoning LLM Deployment on Edge GPUs
Benjamin Kubwimana, Qijing Jenny Huang (NVIDIA)
  Keeping up with Large Language Models: A Holistic Methodology of Compute, Memory, Communication, and Cost Modeling
Wenzhe Guo, Joyjit Kundu, Uras Tos, Weijiang Kong, Giuliano Sisto, Timon Evenblij, Manu Perumkunnil (imec)
  DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMs
Ziyu Hu, Zhiqing Zhong (Stevens Institute of Technology); Weijian Zheng (Argonne National Laboratory); Zhijing Ye (Stevens Institute of Technology); Xuwei Tan, Xueru Zhang (The Ohio State University); Zhen Xie (Binghamton University); Rajkumar Kettimuthu (Argonne National Laboratory); Xiaodong Yu (Stevens Institute of Technology)
2:35
|
2:55
Coffee Break
2:55
|
4:10
Session 3: Security, Confidentiality, and Reliability
Session chair: Elaheh Sadredini (University of California, Riverside)
  ZKProphet: Understanding Performance of Zero-Knowledge Proofs on GPUs
Tarunesh Verma, Yichao Yuan, Nishil Talati (University of Michigan); Todd Austin (University of Michigan / Agita Labs)
  The Curious Case of Global Stable Loads
Shagnik Pal (University of Texas at Austin); Jeeho Ryoo (Fairleigh Dickinson University); Lizy K. John (UT Austin)
  vACE: Exploring the Design Space of Vector Processing Units for Soft Error Vulnerability
George-Marios Fragkoulis, Dimitris Gizopoulos (University of Athens)
  CASM: A Generalizable and Accessible Security Metric to Evaluate Security of Cache Architectures
Phaedra Curlin, Tamara Silbergleit Lehman (University of Colorado Boulder)
  Best Paper Candidate: Learning Architectural Cache Simulator Behaviour
Pranjali Jain (UC Santa Barbara); Meiru Han (University of Pennsylvania); Zhizhou Zhang (Uber Technologies Inc); Brandon Lee, Jonathan Balkind (UC Santa Barbara)
4:10
|
5:30
Poster Session
Session chair: Wantong Li (University of California, Riverside)
5:30
|
8:30
Conference Banquet

Tuesday, October 14

   
8:45
|
10:00
Session 4: AI Accelerators, PIM, and Post-Moore Architectures
Session chair: Sudhanshu Gupta (University of Rochester)
  HALO: Hybrid Systolic Arrays via Logical Partitioning for Acceleration of Complex-Valued Neural Networks
Ji Yeong Yi, Eunbi Jeong, SungHee Yum, Jane Rhee (Ewha Womans University); Sangun Choi, Gunjae Koo, Yunho Oh (Korea University); Myung Kuk Yoon (Ewha Womans University)
  Exploring Lossy Compression of Activation Data for Emerging AI Accelerators: A Case Study on the Graphcore IPU
Milan Shah (North Carolina State University); Xiaodong Yu (Stevens Institute of Technology); Sheng Di (Argonne National Laboratory); Michela Becchi (North Carolina State University); Franck Cappello (Argonne National Laboratory)
  Best Paper Candidate: BetterTogether: A Interference-Aware Framework for Fine-grained Software Pipelining on Heterogeneous SoCs
Yanwen Xu, Rithik Sharma, Zheyuan Chen, Shaan Mistry (University of California, Santa Cruz); Tyler Sorensen (Microsoft Research, University of California Santa Cruz)
  PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration
Daniel Pacheco, Leonel Sousa (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa); Aleksandar Ilic (INESC-ID & Instituto Superior Técnico)
  ALPHA-PIM: Analysis of Linear Algebraic Processing for High-Performance Graph Applications on a Real Processing-In-Memory System
Marzieh Barkhordar, Alireza Tabatabaeian (Simon Fraser University); Mohammad Sadrosadati (ETH Zürich); Christina Giannoula (University of Toronto); Juan Gomez Luna (NVIDIA); Izzat El Hajj (American University of Beirut); Onur Mutlu (ETH Zurich); Alaa Alameldeen (Simon Fraser University)
10:00
|
10:20
Coffee Break
10:20
|
11:50
Session 5: Emerging Workloads
Session chair: Tamara Silbergleit Lehman (University of Colorado Boulder)
  PangenomicsBench: A Benchmark Suite and Characterization of Pangenomics
Noah Kaplan (University of Michigan); Jan-Niklas Schmelzle (Cornell University); Yufeng Gu (University of Michigan); Erik Garrison (University of Tennessee Health Science Center); Christopher Batten (Cornell University); Reetuparna Das (University of Michigan)
  decoder-bench: Benchmarking Decoders for Quantum Error Correction
Satvik Maurya (University of Wisconsin-Madison); Joshua Viszlai (University of Chicago); Nithin Raveendran (University of Arizona); Poulami Das (UT Austin); Swamit Tannu (University of Wisconsin-Madison)
  A Comprehensive Analysis of Graph Neural Networks Training at Different Scales
Mostafa Eghbali Zarch, Michela Becchi (North Carolina State University, USA)
  EntoBench: A Benchmark Suite and Evaluation Framework for Insect-Scale Robotics
Derin Ozturk, Nick Cebry, Angela Cui, Hang Gao, Julie Villamil, Farrell Helbling, Christopher Batten (Cornell University)
  Improving the Performance of Out-of-Core LLM Inference Using Heterogeneous Host Memory
Sudhanshu Gupta (University of Rochester); Sandhya Dwarkadas (University of Virginia)
  miniGiraffe: A Pangenomic Mapping Proxy App
Jessica Imlau Dagostini (University of California Santa Cruz); Scott Beamer (University of California, Santa Cruz); Tyler Sorensen (Microsoft Research and UC Santa Cruz); Joseph Manzano (Pacific Northwest National Lab)
12:00
|
1:20
Lunch
1:20
|
2:50
Session 6: Memory, Storage, and Beyond
Session chair: Lina Sawalha (Western Michigan University)
  Best Paper Candidate: Does Linux Provide Performance Isolation for NVMe SSDs? Configuring cgroups for I/O Control in the NVMe Era
Krijn Doekemeijer, Zebin Ren, Tiziano De Matteis, Balakrishnan Chandrasekaran (Vrije Universiteit Amsterdam); Animesh Trivedi (IBM Research Europe, Zurich)
  Best Paper Candidate: Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs
Jacob Wahlgren, Gabin Schieffer, Ruimin Shi (KTH Royal Institute of Technology); Edgar A. León, Roger Pearce, Maya Gokhale (Lawrence Livermore National Laboratory); Ivy Peng (KTH Royal Institute of Technology)
  Design and Accuracy Trade-offs in Computational Statistics
Tiancheng Xu, Alan L. Cox, Scott Rixner (Rice University)
  An Analysis of Ethereum Workloads from a Key-Value Storage Perspective
Yanjing Ren, Jia Zhao (The Chinese University of Hong Kong); Jingwei Li (University of Electronic Science and Technology of China); Patrick P. C. Lee (The Chinese University of Hong Kong)
  Storage-Based Approximate Nearest Neighbor Search: What are the Performance Cost and I/O Characteristics?
Zebin Ren (Vrije Universiteit Amsterdam); Krijn Doekemeijer (Vrije Universiteit Amsterdam, The Netherlands); Padma Apparao (Intel Corporation); Animesh Trivedi (IBM Research Europe, Zurich)
  Sweet or Sour CHERI: Performance Characterization of the Arm Morello Platform
Xiaoyang Sun (University of Leeds); Jeremy Singer (University of Glasgow); Zheng Wang (University of Leeds)
2:50
|
3:10
Coffee Break
3:10
|
4:40
Session 7: Heterogeneous and Domain-Specific Systems
Session chair: Benjamin Kubwimana (NVIDIA)
  Characterizing Adaptive Mesh Refinement on Heterogeneous Platforms with Parthenon-VIBE
Akash Poptani, Alireza Khadem, Scott Mahlke (University of Michigan); Jonah Miller, Joshua Dolence, Galen Shipman (Los Alamos National Laboratory); Reetuparna Das (University of Michigan)
  Best Paper Candidate: XRSight: An End-to-End Hardware–Software Co-Design Platform for XR SoC Evaluation
Prashanth Ganesh, Zekai Lin, Yakun Sophia Shao (UC Berkeley)
  Icicle: Open-source Hardware Support for Top-Down Microarchitectural Analysis on RISC-V
Matthew Edwin Weingarten, Michael Grieco, Stephen A Edwards, Tanvir Ahmed Khan (Columbia University)
  AlphaFold3 Workload Characterization: A Comprehensive Analysis of Bottlenecks and Performance Scaling
Jinpyo Kim, Mingi Kwon, Jishen Zhao (UCSD)
  Characterizing and Optimizing Real-Time Optimal Control for Embedded SoCs
Shengjun Kris Dong, Dima Nikiforov, Widyadewi Soedarmadji, Minh Nguyen, Vikram Jain, Christopher W. Fletcher, Yakun Sophia Shao (University of California, Berkeley)
  ClusterSim: Modeling Thread Block Clusters in Hopper GPUs
Tim Lühnen (Technische Universität Hamburg); Jyotirman Behera, Devashree Tripathy (IIT Bhubaneswar); Sohan Lal (Technische Universität Hamburg)
4:40
|
5:00
Closing and Best Paper Award

Poster Session

Session chair: Wantong Li (University of California, Riverside)

The poster session will be held during Monday, October 13 from 4:10 PM to 5:30 PM.

     
    Adaptive Graphical Settings Optimization for Energy-Efficient 3D Rendering
Nolinz Szafranski (Western Michigan University); Hong Bing Tang (Central Michigan University); Lina Sawalha (Western Michigan University)
    Benchmarking Support for RISC-V CPUs in Serverless Computing
Georgios Pournaras, Vasileios Karakostas (University of Athens); George Papadimitriou (University of Patras); Dimitris Gizopoulos (University of Athens)
    PArtnerM2LB: Personal Assistant Multi-device Machine Learning Benchmark
Yu-Ching Hu (University of California, Riverside); Owen Lam (University of California, San Diego); Yuliang Li (Meta); Hung-Wei Tseng (University of California, Riverside)
    CoBloom: An FPGA Accelerator System for Bloom Filter Insertion in Genomics Applications
Patrick Hardison, Chris Kjellqvist, Ning Liang, Lisa Wu Wills (Duke University)
    MEMPRINT: Constructing Program Memory Footprint Estimations using statistical methods from Sparsely Sampled Pin-based Memory Traces
Nafis Mustakin (UCR); Joseph Manzano (Pacific Northwest National Lab); Daniel Wong (UCR)
    TrustFL: Byzantine-Resilient Federated Learning via Adaptive Trust Aggregation for Resource-Constrained IoT
Ravi Gupta (AMD); Guneet Bhatia (Siemens-Energy); Shabista Haider (Oracle)
    Power-Aware Fine-Grained Composition of Open-Source Hardware
Michael Grieco, Matthew Edwin Weingarten, Tanvir Ahmed Khan (Columbia University in the City of New York)
    Situla: Studying the Interplay of Sparse Formats and CPU/GPU Libraries
Amirmahdi Namjoo, Sanjali Yadav (University of Maryland-College Park); Helya Hosseini, Bahar Asgari (University of Maryland, College Park)
    No One-Size-Fits-All: A Workload-Driven Characterization of Bit-Parallel vs. Bit-Serial Data Layouts for Processing-using-Memory
Jingyao Zhang, Elaheh Sadredini (University of California, Riverside)