Program
Sunday, October 12 (Tutorials and Workshops) Monday, October 13 Tuesday, October 14 Poster Session (4:10pm Monday)
Sunday, October 12
9:00 | 10:20 |
Tutorial: Observability into Application-level Metrics with eBPFs Session 1 of 2 |
10:20 | 10:40 |
Coffee Break |
10:40 | 12:00 |
Tutorial: Observability into Application-level Metrics with eBPFs Session 2 of 2 |
12:00 | 1:20 |
Lunch |
1:20 | 3:00 |
Workshop: Workshop on Security–Performance Trade-offs Session 1 of 2 |
3:00 | 3:20 |
Coffee Break |
3:20 | 5:00 |
Workshop: Workshop on Security–Performance Trade-offs Session 2 of 2 |
Monday, October 13
9:00 | 9:15 |
Welcome |
9:15 | 10:15 |
Keynote Talk: Stop Counting the Crowd, Start Finding Waldo! The answer isn’t ‘more’ — it’s ‘where’! Amir Yazdanbakhsh (Google DeepMind) Session chair: Meena Arunachalam (AMD) Abstract: Modern workloads tempt a single answer: throw more resources at the problem. But resource walls often mask where the real inefficiencies lie. This talk argues for a shift in perspective: the answer to our scaling challenges isn’t just more—it’s where: the rare hot-paths, dataflows, algorithmic choices, and software patterns that dominate cost. This talk demonstrates how AI-driven approaches can identify and mitigate these “Waldo patterns,” and presents recent case studies from our work across the computing stack to identify such patterns and the resulting optimizations. The talk closes with a call to build the next generation of characterization tools, benchmarks, and AI-driven workflows that make finding Waldo repeatable and measurable. Bio: Amir Yazdanbakhsh is a Research Scientist at Google DeepMind, working at the intersection of machine learning and computer architecture. His primary focus is on applying machine learning to design efficient and sustainable computing systems, from leading the development of large-scale distributed training systems on TPUs to shaping the next generation of Google’s ML accelerators. His work has been recognized by the ISCA Hall of Fame. Notably, his research on using AI to solve performance challenges in hyperscale systems received an IEEE Micro Top Picks award, and his work on a new system for AI won the IEEE Computer Society Best Paper Award. Amir received his Ph.D. from the Georgia Institute of Technology, where he was a recipient of the Microsoft and Qualcomm fellowships. |
10:15 | 10:35 |
Coffee Break |
10:35 | 11:50 |
Session 1: Cross-Domain Methods for Workload Analysis Session chair: Patrick P. C. Lee (The Chinese University of Hong Kong) |
Belenos: Bottleneck Evaluation to Link Biomechanics to Novel Computing Optimizations Hana Chitsaz, Johnson Umeike, Amirmahdi Namjoo (University of Maryland, College Park); Babak N. Safa (University of South Florida); Bahar Asgari (University of Maryland, College Park) |
|
Workload Characterization Using Cross-Layer Features and Multilevel PCA Lina Sawalha, Grant Deljevic (Western Michigan University) |
|
Athena: A Plug-and-Play Advisor for Retrieval-Augmented Generation using VectorDB Ning Liang (Duke University); Fabian Wenz, Jana Giceva (TU Munich); Lisa Wu Wills (Duke University) |
|
The Fake-Busy and True-Idle Problems of Running Graph Applications on Chiplet-Based Multi-cores Rashid Aligholipour, Yuan Yao (Uppsala University) |
|
WANify: Gauging and Balancing Runtime WAN Bandwidth for Geo-distributed Data Analytics Anshuman Das Mohapatra, Kwangsung Oh (University of Nebraska at Omaha) |
|
12:00 | 1:20 |
Lunch |
1:20 | 2:35 |
Session 2: Large Language Models Session chair: Todd Austin (University of Michigan / Agita Labs) |
Understanding Distributed Training of Large Language Models with Unified Virtual Memory Jane Rhee, Eunbi Jeong (Ewha Womans University); Jiwon Lee (Samsung Electronics); Myung Kuk Yoon (Ewha Womans University) |
|
Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs Marcin Chrapek, Marcin Copik, Etienne Mettaz, Torsten Hoefler (ETH Zurich) |
|
EdgeReasoning: Optimizing Reasoning LLM Deployment on Edge GPUs Benjamin Kubwimana, Qijing Jenny Huang (NVIDIA) |
|
Keeping up with Large Language Models: A Holistic Methodology of Compute, Memory, Communication, and Cost Modeling Wenzhe Guo, Joyjit Kundu, Uras Tos, Weijiang Kong, Giuliano Sisto, Timon Evenblij, Manu Perumkunnil (imec) |
|
DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMs Ziyu Hu, Zhiqing Zhong (Stevens Institute of Technology); Weijian Zheng (Argonne National Laboratory); Zhijing Ye (Stevens Institute of Technology); Xuwei Tan, Xueru Zhang (The Ohio State University); Zhen Xie (Binghamton University); Rajkumar Kettimuthu (Argonne National Laboratory); Xiaodong Yu (Stevens Institute of Technology) |
|
2:35 | 2:55 |
Coffee Break |
2:55 | 4:10 |
Session 3: Security, Confidentiality, and Reliability Session chair: Elaheh Sadredini (University of California, Riverside) |
ZKProphet: Understanding Performance of Zero-Knowledge Proofs on GPUs Tarunesh Verma, Yichao Yuan, Nishil Talati (University of Michigan); Todd Austin (University of Michigan / Agita Labs) |
|
The Curious Case of Global Stable Loads Shagnik Pal (University of Texas at Austin); Jeeho Ryoo (Fairleigh Dickinson University); Lizy K. John (UT Austin) |
|
vACE: Exploring the Design Space of Vector Processing Units for Soft Error Vulnerability George-Marios Fragkoulis, Dimitris Gizopoulos (University of Athens) |
|
CASM: A Generalizable and Accessible Security Metric to Evaluate Security of Cache Architectures Phaedra Curlin, Tamara Silbergleit Lehman (University of Colorado Boulder) |
|
Best Paper Candidate: Learning Architectural Cache Simulator Behaviour Pranjali Jain (UC Santa Barbara); Meiru Han (University of Pennsylvania); Zhizhou Zhang (Uber Technologies Inc); Brandon Lee, Jonathan Balkind (UC Santa Barbara) |
|
4:10 | 5:30 |
Poster Session Session chair: Wantong Li (University of California, Riverside) |
5:30 | 8:30 |
Conference Banquet |
Tuesday, October 14
8:45 | 10:00 |
Session 4: AI Accelerators, PIM, and Post-Moore Architectures Session chair: Sudhanshu Gupta (University of Rochester) |
HALO: Hybrid Systolic Arrays via Logical Partitioning for Acceleration of Complex-Valued Neural Networks Ji Yeong Yi, Eunbi Jeong, SungHee Yum, Jane Rhee (Ewha Womans University); Sangun Choi, Gunjae Koo, Yunho Oh (Korea University); Myung Kuk Yoon (Ewha Womans University) |
|
Exploring Lossy Compression of Activation Data for Emerging AI Accelerators: A Case Study on the Graphcore IPU Milan Shah (North Carolina State University); Xiaodong Yu (Stevens Institute of Technology); Sheng Di (Argonne National Laboratory); Michela Becchi (North Carolina State University); Franck Cappello (Argonne National Laboratory) |
|
Best Paper Candidate: BetterTogether: A Interference-Aware Framework for Fine-grained Software Pipelining on Heterogeneous SoCs Yanwen Xu, Rithik Sharma, Zheyuan Chen, Shaan Mistry (University of California, Santa Cruz); Tyler Sorensen (Microsoft Research, University of California Santa Cruz) |
|
PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration Daniel Pacheco, Leonel Sousa (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa); Aleksandar Ilic (INESC-ID & Instituto Superior Técnico) |
|
ALPHA-PIM: Analysis of Linear Algebraic Processing for High-Performance Graph Applications on a Real Processing-In-Memory System Marzieh Barkhordar, Alireza Tabatabaeian (Simon Fraser University); Mohammad Sadrosadati (ETH Zürich); Christina Giannoula (University of Toronto); Juan Gomez Luna (NVIDIA); Izzat El Hajj (American University of Beirut); Onur Mutlu (ETH Zurich); Alaa Alameldeen (Simon Fraser University) |
|
10:00 | 10:20 |
Coffee Break |
10:20 | 11:50 |
Session 5: Emerging Workloads Session chair: Tamara Silbergleit Lehman (University of Colorado Boulder) |
PangenomicsBench: A Benchmark Suite and Characterization of Pangenomics Noah Kaplan (University of Michigan); Jan-Niklas Schmelzle (Cornell University); Yufeng Gu (University of Michigan); Erik Garrison (University of Tennessee Health Science Center); Christopher Batten (Cornell University); Reetuparna Das (University of Michigan) |
|
decoder-bench: Benchmarking Decoders for Quantum Error Correction Satvik Maurya (University of Wisconsin-Madison); Joshua Viszlai (University of Chicago); Nithin Raveendran (University of Arizona); Poulami Das (UT Austin); Swamit Tannu (University of Wisconsin-Madison) |
|
A Comprehensive Analysis of Graph Neural Networks Training at Different Scales Mostafa Eghbali Zarch, Michela Becchi (North Carolina State University, USA) |
|
EntoBench: A Benchmark Suite and Evaluation Framework for Insect-Scale Robotics Derin Ozturk, Nick Cebry, Angela Cui, Hang Gao, Julie Villamil, Farrell Helbling, Christopher Batten (Cornell University) |
|
Improving the Performance of Out-of-Core LLM Inference Using Heterogeneous Host Memory Sudhanshu Gupta (University of Rochester); Sandhya Dwarkadas (University of Virginia) |
|
miniGiraffe: A Pangenomic Mapping Proxy App Jessica Imlau Dagostini (University of California Santa Cruz); Scott Beamer (University of California, Santa Cruz); Tyler Sorensen (Microsoft Research and UC Santa Cruz); Joseph Manzano (Pacific Northwest National Lab) |
|
12:00 | 1:20 |
Lunch |
1:20 | 2:50 |
Session 6: Memory, Storage, and Beyond Session chair: Lina Sawalha (Western Michigan University) |
Best Paper Candidate: Does Linux Provide Performance Isolation for NVMe SSDs? Configuring cgroups for I/O Control in the NVMe Era Krijn Doekemeijer, Zebin Ren, Tiziano De Matteis, Balakrishnan Chandrasekaran (Vrije Universiteit Amsterdam); Animesh Trivedi (IBM Research Europe, Zurich) |
|
Best Paper Candidate: Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs Jacob Wahlgren, Gabin Schieffer, Ruimin Shi (KTH Royal Institute of Technology); Edgar A. León, Roger Pearce, Maya Gokhale (Lawrence Livermore National Laboratory); Ivy Peng (KTH Royal Institute of Technology) |
|
Design and Accuracy Trade-offs in Computational Statistics Tiancheng Xu, Alan L. Cox, Scott Rixner (Rice University) |
|
An Analysis of Ethereum Workloads from a Key-Value Storage Perspective Yanjing Ren, Jia Zhao (The Chinese University of Hong Kong); Jingwei Li (University of Electronic Science and Technology of China); Patrick P. C. Lee (The Chinese University of Hong Kong) |
|
Storage-Based Approximate Nearest Neighbor Search: What are the Performance Cost and I/O Characteristics? Zebin Ren (Vrije Universiteit Amsterdam); Krijn Doekemeijer (Vrije Universiteit Amsterdam, The Netherlands); Padma Apparao (Intel Corporation); Animesh Trivedi (IBM Research Europe, Zurich) |
|
Sweet or Sour CHERI: Performance Characterization of the Arm Morello Platform Xiaoyang Sun (University of Leeds); Jeremy Singer (University of Glasgow); Zheng Wang (University of Leeds) |
|
2:50 | 3:10 |
Coffee Break |
3:10 | 4:40 |
Session 7: Heterogeneous and Domain-Specific Systems Session chair: Benjamin Kubwimana (NVIDIA) |
Characterizing Adaptive Mesh Refinement on Heterogeneous Platforms with Parthenon-VIBE Akash Poptani, Alireza Khadem, Scott Mahlke (University of Michigan); Jonah Miller, Joshua Dolence, Galen Shipman (Los Alamos National Laboratory); Reetuparna Das (University of Michigan) |
|
Best Paper Candidate: XRSight: An End-to-End Hardware–Software Co-Design Platform for XR SoC Evaluation Prashanth Ganesh, Zekai Lin, Yakun Sophia Shao (UC Berkeley) |
|
Icicle: Open-source Hardware Support for Top-Down Microarchitectural Analysis on RISC-V Matthew Edwin Weingarten, Michael Grieco, Stephen A Edwards, Tanvir Ahmed Khan (Columbia University) |
|
AlphaFold3 Workload Characterization: A Comprehensive Analysis of Bottlenecks and Performance Scaling Jinpyo Kim, Mingi Kwon, Jishen Zhao (UCSD) |
|
Characterizing and Optimizing Real-Time Optimal Control for Embedded SoCs Shengjun Kris Dong, Dima Nikiforov, Widyadewi Soedarmadji, Minh Nguyen, Vikram Jain, Christopher W. Fletcher, Yakun Sophia Shao (University of California, Berkeley) |
|
ClusterSim: Modeling Thread Block Clusters in Hopper GPUs Tim Lühnen (Technische Universität Hamburg); Jyotirman Behera, Devashree Tripathy (IIT Bhubaneswar); Sohan Lal (Technische Universität Hamburg) |
|
4:40 | 5:00 |
Closing and Best Paper Award |
Poster Session
Session chair: Wantong Li (University of California, Riverside)
The poster session will be held during Monday, October 13 from 4:10 PM to 5:30 PM.
Adaptive Graphical Settings Optimization for Energy-Efficient 3D Rendering Nolinz Szafranski (Western Michigan University); Hong Bing Tang (Central Michigan University); Lina Sawalha (Western Michigan University) |
||
Benchmarking Support for RISC-V CPUs in Serverless Computing Georgios Pournaras, Vasileios Karakostas (University of Athens); George Papadimitriou (University of Patras); Dimitris Gizopoulos (University of Athens) |
||
PArtnerM2LB: Personal Assistant Multi-device Machine Learning Benchmark Yu-Ching Hu (University of California, Riverside); Owen Lam (University of California, San Diego); Yuliang Li (Meta); Hung-Wei Tseng (University of California, Riverside) |
||
CoBloom: An FPGA Accelerator System for Bloom Filter Insertion in Genomics Applications Patrick Hardison, Chris Kjellqvist, Ning Liang, Lisa Wu Wills (Duke University) |
||
MEMPRINT: Constructing Program Memory Footprint Estimations using statistical methods from Sparsely Sampled Pin-based Memory Traces Nafis Mustakin (UCR); Joseph Manzano (Pacific Northwest National Lab); Daniel Wong (UCR) |
||
TrustFL: Byzantine-Resilient Federated Learning via Adaptive Trust Aggregation for Resource-Constrained IoT Ravi Gupta (AMD); Guneet Bhatia (Siemens-Energy); Shabista Haider (Oracle) |
||
Power-Aware Fine-Grained Composition of Open-Source Hardware Michael Grieco, Matthew Edwin Weingarten, Tanvir Ahmed Khan (Columbia University in the City of New York) |
||
Situla: Studying the Interplay of Sparse Formats and CPU/GPU Libraries Amirmahdi Namjoo, Sanjali Yadav (University of Maryland-College Park); Helya Hosseini, Bahar Asgari (University of Maryland, College Park) |
||
No One-Size-Fits-All: A Workload-Driven Characterization of Bit-Parallel vs. Bit-Serial Data Layouts for Processing-using-Memory Jingyao Zhang, Elaheh Sadredini (University of California, Riverside) |