View PDF Sunday (Tutorials and Workshops) Monday Tuesday Poster Session (16:20pm Monday)

The current program is tentative and subject to change.

Sunday, September 15

   
8:00 Tutorial: Training Big Sparse Recommendation Models on Commodity Servers
Prashant Nair, Muhammad Adnan
10:00 Coffee Break
10:20
|
12:10
Tutorial: Training Big Sparse Recommendation Models on Commodity Servers
Prashant Nair, Muhammad Adnan
13:40 Coffee Break
14:00
|
17:00
Workshop: Multimodal LLM and Generative AI Workloads - Workload Characterization and Implications to Software Stack, Compilers, Computer Architectures and Communications
Huawei Sponsored Workshop

Monday, September 16

   
8:45 Opening and Welcome
9:00 Keynote Speech: Charting the Rise and Future of GPU Computing
Speaker: Prof. Tor Aamodt (University of British Columbia)
Abstract: The rapid rise in use of Graphics Processor Units (GPUs) for machine learning (ML) arguably surprised many. This talk will explore the evolution of GPU architecture from a way to accelerate video game graphics to unassuming beginnings as way to expand the market for GPUs beyond graphics to its current position as the goto platform for ML training. Along the way the talk will consider what computer architects and specifically those interested in computer workload characterization might learn from the current success of GPUs that can be applied to their future.
Photo of Prof. Tor Aamodt
Bio: Tor M. Aamodt is a Professor in the Department of Electrical and Computer Engineering at the University of British Columbia, where he has been a faculty member since 2006. His current research focuses on the architecture of general-purpose GPUs and energy-efficient computing, most recently including accelerators for machine learning. Along with students in his research group, he developed the widely used GPGPU-Sim simulator. Three of his papers have been selected as Top Picks by IEEE Micro Magazine, a fourth was selected as a Top Picks honorable mention. One of his papers was also selected as a Research Highlight in Communications of the ACM. He is in the MICRO Hall of Fame.
10:00 Coffee Break
10:20 Session 1: Best Paper Nominees
Session chair: Lisa Wu Wills (Duke University) and Omesh Tickoo (Intel Labs)
  CRISP: Concurrent Rendering and Compute Simulation Platform for GPUs
Junrui Pan, Timothy G. Rogers (Purdue University)
  LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
Jaehong Cho, Minsu Kim, Hyunmin Choi, Guseul Heo, Jongse Park (KAIST)
  Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling
Rajveer Bachkaniwala, Harshith Lanka, Kexin Rong, Ada Gavrilovska (Georgia Institute of Technology)
  Mediator: Characterizing and Optimizing Multi-DNN Inference for Energy Efficient Edge Intelligence
Seung Hun Choi, Myung Jae Chung, Young Geun Kim, Sung Woo Chung (Korea University)
  Performance Modeling and Workload Analysis of Distributed Large Language Model Training and Inference
Joyjit Kundu, Wenzhe Guo, Ali BanaGozar, Udari De Alwis, Sourav Sengupta (imec); Puneet Gupta (UCLA); Arindam Mallik (imec)
12:00 Lunch
13:20 Session 2: Performance Measurement Tools and Techniques
Session chair: Lillian Pentecost (Amherst College)
  CARM Tool: Cache-Aware Roofline Model Automatic Benchmarking and Application Analysis
José Morgado, Leonel Sousa, Aleksandar Ilic (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa)
  SHARP: A Distribution-Based Framework for Reproducible Performance Evaluation
Viyom Mittal (University of California, Riverside); Pedro Bruel (Hewlett Packard Labs, USA); Michalis Faloutsos (University of California, Riverside); Dejan Milojicic, Eitan Frachtenberg (Hewlett Packard Labs, USA)
  Taming Performance Variability caused by Client-Side Hardware Configuration
Georgia Antoniou, Haris Volos, Yiannakis Sazeides (University of Cyprus)
  HEX-SIM: Evaluating Multi-modal Large Language Models on Multi-chiplet NPUs
Xinquan Lin (FuZhou University); Haobo Xu, Yinhe Han, Yiming Gan (ICT, Chinese Academy of Sciences)
14:40 Coffee Break
15:00 Session 3: Emerging Applications and Technologies
Session chair: Lillian Pentecost (Amherst College)
  Evergreen: Comprehensive Carbon Model for Performance-Emission Tradeoffs
Tersiteab Adem, Andrew McCrabb, Vidushi Goyal, Valeria Bertacco (University of Michigan)
  Performance Analysis of Zero-Knowledge Proofs
Saichand Samudrala, Jiawen Wu, Chen Chen (Texas A&M University); Jonathan Ku, Haoxuan Shan, Yiran Chen (Duke University); JV Rajendran (Texas A&M University)
  VelociTI: An Architecture-level Performance Model ing Framework for Trapped Ion Quantum Computers
Alex Hankin (Harvard University); Abdulrahman Mahmoud (Harvard University/MBZUAI); Mark Hempstead (Tufts University); David Brooks, Gu-Yeon Wei (Harvard University)
  QRIO: Quantum Resource Infrastructure Orchestrator
Shmeelok Chakraborty, Yuewen Hou, Ang Chen, Gokul Subramanian Ravi (University of Michigan)
16:20 Poster Lightning Talks
Session chair: Lisa Wu Wills (Duke University)
16:30 Poster Session
Session chair: Lisa Wu Wills (Duke University)
17:30 Conference Banquet @ Cecil Green Park House

Tuesday, September 17

   
9:00 Keynote Speech: Architecture 2.0: Workload Characterization in the Generative AI Era
Speaker: Prof. Vijay Janapa Reddi (Harvard University)
Abstract: Generative AI has the potential to revolutionize computer system design, transforming AI from a mere workload that we study and optimize into a full-fledged autonomous design tool. This enables us to build increasingly complex computing systems in less time and with better efficiency. We term this shift toward AI-assisted hardware design as “Architecture 2.0.” This talk introduces the foundational building blocks needed to enable Architecture 2.0. It also states that we must radically rethink our approaches to workload characterization, benchmarking methodologies, and system analysis to leverage these agents in computer system design. The talk dissects this interplay and sets the stage for new avenues of research and development that promise more efficient, adaptable, and intelligent computing systems in the future, paving the way for a new paradigm in computer architecture.
Photo of Prof. Vijay Janapa Reddi
Bio: Dr. Vijay Janapa Reddi is the John L. Loeb Associate Professor of Engineering and Applied Sciences at Harvard University and Vice President and co-founder of MLCommons (mlcommons.org), a nonprofit organization accelerating machine learning innovation. He oversees MLCommons Research and serves on its board of directors. Dr. Janapa Reddi co-led the development of the MLPerf benchmarks, which evaluate a wide range of ML systems from megawatt to microwatt scales. He also serves on the tinyML Foundation’s board of directors (https://tinyml.org), where he is shaping the future of edge AI technologies and academic-industry partnerships. His research leverages his expertise in computer architecture and applied machine learning to push the boundaries of mobile and edge computing and IoT advancements. Passionate about promoting equitable access to technology and education, he is the author, editor, and maintainer of the open-source Machine Learning Systems textbook (https://mlsysbook.ai), and he developed the Tiny Machine Learning educational series on edX, which has trained over 100,000 students globally in recent years. He holds Ph.D., M.S., and B.S. degrees from Harvard University, the University of Colorado at Boulder, and Santa Clara University, respectively.
10:00 Coffee Break
10:20 Session 4: LLMs and Systems for Machine Learning
Session chair: Guru Venkataramani (George Washington University)
  Understanding Performance Implications of LLM Inference on CPUs
Seonjin Na, Geonhwa Jeong (Georgia Institute of Technology); Byunghoon Ahn (University of California San Diego); Jeffrey Young, Tushar Krishna (Georgia Institute of Technology); Hyesoon Kim (Georgia Tech)
  Low-Bitwidth Floating Point Quantization for Efficient, High-Quality Diffusion Models
Cheng Chen, Christina Giannoula, Andreas Moshovos (University of Toronto)
  Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models
Chakshu Moar, Faraz Tahmasebi (University of California, Irvine); Michael Pellauer (NVIDIA); Hyoukjun Kwon (University of California, Irvine)
  Understanding The Performance and Estimating The Cost Of LLM Fine-Tuning
Yuchen Xia (University of Michigan); Jiho Kim (Georgia Institute of Technology); Yuhan Chen, Haojie Ye (University of Michigan); Souvik Kundu (Intel Labs); Cong ”Callie” Hao (Georgia Institute of Technology); Nishil Talati (University of Michigan)
  Characterizing and Optimizing the End-to-End Performance of Multi-Agent Reinforcement Learning Systems
Kailash gogineni, Yongsheng Mei (George Washington University); Karthikeya Gogineni (Independent); Peng Wei, Tian Lan, Guru Venkataramani (George Washington University)
12:00 Lunch
13:20 Session 5: Caches and Memory
Session chair: Prashant Nair (University of British Columbia)
  Understanding Address Translation Scaling Behaviours Using Hardware Performance Counters
Nick Lindsay, Abhishek Bhattacharjee (Yale University)
  Architectural Modeling and Benchmarking for Digital DRAM PIM
Farzana Ahmed Siddique, Deyuan Guo, Zhenxing Fan, Mohammadhosein Gholamrezaei, Khyati Kiyawat, Morteza Baradaran, Alif Ahmed, Kyle Durrer, Abdullah T. Mughrabi, Hugo Abbot, Ethan Ermovick, Ashish Venkat, Kevin Skadron (University of Virginia)
  Enhanced System-Level Coherence for Heterogeneous Unified Memory Architectures
Anoop Mysore Nataraja (University of Washington); Ricardo Fern´andez-Pascual, Alberto Ros (University of Murcia)
  Characterizing Emerging Page Replacement Policies for Memory-Intensive Applications
Michael Wu (Yale University); Sibren Isaacman (Loyola University Maryland); Abhishek Bhattacharjee (Yale University)
  Kindle: A Comprehensive Framework for Exploring OS-Architecture Interplay in Hybrid Memory Systems
Arun KP (Indian Institute of Technology Kanpur); Debadatta Mishra (IIT Kanpur, India)
15:00 Coffee Break
15:20 Session 6: GPUs and Heterogeneous Systems
Session chair: Omesh Tickoo (Intel Labs)
  Characterizing CUDA and OpenMP Synchronization Primitives
Brandon Burtchell, Martin Burtscher (Texas State University)
  Evaluating Performance and Energy Efficiency of Parallel Programming Models in Heterogeneous Computing
Demirhan Sevim, Baturalp Bilgin, Ismail Akturk (Ozyegin University)
  Performance Impact of Removing Data Races from GPU Graph Analytics Programs
Yiqian Liu, Avery VanAusdal, Martin Burtscher (Texas State University)
16:20 Closing and Best Paper Award

Poster Session (16:20PM Monday)

Session chair: Lisa Wu Wills (Duke University)

CONFER: A Toolchain for Comparing Hardware Costs Among Number Formats
Madi Gudin, Jesse Looney, Lillian Pentecost (Amherst College)
EDAMAME: fEDerAted Memory-Adaptive siMulated lEarning
Soyon Choi, Lillian Pentecost (Amherst College)
MAccel-sim: A Multi-GPU Simulator for Architectural Exploration
Christin David Bose, Cesar Avalos, Junrui Pan, Mahmoud Khairy, Timothy Rogers (Purdue University)
LLaMa Profiling for Future Architecture
Mansi Choudhary, Ceyu Xu, Lisa Wu Wills (Duke University)
TPUSim
Mansi Choudhary, Ceyu Xu, Lisa Wu Wills (Duke University)
Closing the Observability Gap: Scaling Serverless Resources More Frequently
Nima Nasiri, Nalin Munshi (University of British Columbia); Simon D Moser (IBM); Sathish Gopalakrishnan, Mohammad Shahrad (University of British Columbia)