Program
View PDF Sunday (Tutorials and Workshops) Monday Tuesday Poster Session (16:20pm Monday)
The current program is tentative and subject to change.
Sunday, September 15
8:00 | Tutorial: Training Big Sparse Recommendation Models on Commodity Servers Prashant Nair, Muhammad Adnan |
10:00 | Coffee Break |
10:20 | 12:10 |
Tutorial: Training Big Sparse Recommendation Models on Commodity Servers Prashant Nair, Muhammad Adnan |
13:40 | Coffee Break |
14:00 | 17:00 |
Workshop: Multimodal LLM and Generative AI Workloads - Workload Characterization and Implications to Software Stack, Compilers, Computer Architectures and Communications Huawei Sponsored Workshop |
Monday, September 16
8:45 | Opening and Welcome |
9:00 | Keynote Speech: Charting the Rise and Future of GPU Computing Speaker: Prof. Tor Aamodt (University of British Columbia) Abstract: The rapid rise in use of Graphics Processor Units (GPUs) for machine learning (ML) arguably surprised many. This talk will explore the evolution of GPU architecture from a way to accelerate video game graphics to unassuming beginnings as way to expand the market for GPUs beyond graphics to its current position as the goto platform for ML training. Along the way the talk will consider what computer architects and specifically those interested in computer workload characterization might learn from the current success of GPUs that can be applied to their future. Bio: Tor M. Aamodt is a Professor in the Department of Electrical and Computer Engineering at the University of British Columbia, where he has been a faculty member since 2006. His current research focuses on the architecture of general-purpose GPUs and energy-efficient computing, most recently including accelerators for machine learning. Along with students in his research group, he developed the widely used GPGPU-Sim simulator. Three of his papers have been selected as Top Picks by IEEE Micro Magazine, a fourth was selected as a Top Picks honorable mention. One of his papers was also selected as a Research Highlight in Communications of the ACM. He is in the MICRO Hall of Fame. |
10:00 | Coffee Break |
10:20 | Session 1: Best Paper Nominees Session chair: Lisa Wu Wills (Duke University) and Omesh Tickoo (Intel Labs) |
CRISP: Concurrent Rendering and Compute Simulation Platform for GPUs Junrui Pan, Timothy G. Rogers (Purdue University) |
|
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale Jaehong Cho, Minsu Kim, Hyunmin Choi, Guseul Heo, Jongse Park (KAIST) |
|
Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling Rajveer Bachkaniwala, Harshith Lanka, Kexin Rong, Ada Gavrilovska (Georgia Institute of Technology) |
|
Mediator: Characterizing and Optimizing Multi-DNN Inference for Energy Efficient Edge Intelligence Seung Hun Choi, Myung Jae Chung, Young Geun Kim, Sung Woo Chung (Korea University) |
|
Performance Modeling and Workload Analysis of Distributed Large Language Model Training and Inference Joyjit Kundu, Wenzhe Guo, Ali BanaGozar, Udari De Alwis, Sourav Sengupta (imec); Puneet Gupta (UCLA); Arindam Mallik (imec) |
|
12:00 | Lunch |
13:20 | Session 2: Performance Measurement Tools and Techniques Session chair: Lillian Pentecost (Amherst College) |
CARM Tool: Cache-Aware Roofline Model Automatic Benchmarking and Application Analysis José Morgado, Leonel Sousa, Aleksandar Ilic (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa) |
|
SHARP: A Distribution-Based Framework for Reproducible Performance Evaluation Viyom Mittal (University of California, Riverside); Pedro Bruel (Hewlett Packard Labs, USA); Michalis Faloutsos (University of California, Riverside); Dejan Milojicic, Eitan Frachtenberg (Hewlett Packard Labs, USA) |
|
Taming Performance Variability caused by Client-Side Hardware Configuration Georgia Antoniou, Haris Volos, Yiannakis Sazeides (University of Cyprus) |
|
HEX-SIM: Evaluating Multi-modal Large Language Models on Multi-chiplet NPUs Xinquan Lin (FuZhou University); Haobo Xu, Yinhe Han, Yiming Gan (ICT, Chinese Academy of Sciences) |
|
14:40 | Coffee Break |
15:00 | Session 3: Emerging Applications and Technologies Session chair: Lillian Pentecost (Amherst College) |
Evergreen: Comprehensive Carbon Model for Performance-Emission Tradeoffs Tersiteab Adem, Andrew McCrabb, Vidushi Goyal, Valeria Bertacco (University of Michigan) |
|
Performance Analysis of Zero-Knowledge Proofs Saichand Samudrala, Jiawen Wu, Chen Chen (Texas A&M University); Jonathan Ku, Haoxuan Shan, Yiran Chen (Duke University); JV Rajendran (Texas A&M University) |
|
VelociTI: An Architecture-level Performance Model ing Framework for Trapped Ion Quantum Computers Alex Hankin (Harvard University); Abdulrahman Mahmoud (Harvard University/MBZUAI); Mark Hempstead (Tufts University); David Brooks, Gu-Yeon Wei (Harvard University) |
|
QRIO: Quantum Resource Infrastructure Orchestrator Shmeelok Chakraborty, Yuewen Hou, Ang Chen, Gokul Subramanian Ravi (University of Michigan) |
|
16:20 | Poster Lightning Talks Session chair: Lisa Wu Wills (Duke University) |
16:30 | Poster Session Session chair: Lisa Wu Wills (Duke University) |
17:30 | Conference Banquet @ Cecil Green Park House |
Tuesday, September 17
9:00 | Keynote Speech: Architecture 2.0: Workload Characterization in the Generative AI Era Speaker: Prof. Vijay Janapa Reddi (Harvard University) Abstract: Generative AI has the potential to revolutionize computer system design, transforming AI from a mere workload that we study and optimize into a full-fledged autonomous design tool. This enables us to build increasingly complex computing systems in less time and with better efficiency. We term this shift toward AI-assisted hardware design as “Architecture 2.0.” This talk introduces the foundational building blocks needed to enable Architecture 2.0. It also states that we must radically rethink our approaches to workload characterization, benchmarking methodologies, and system analysis to leverage these agents in computer system design. The talk dissects this interplay and sets the stage for new avenues of research and development that promise more efficient, adaptable, and intelligent computing systems in the future, paving the way for a new paradigm in computer architecture. Bio: Dr. Vijay Janapa Reddi is the John L. Loeb Associate Professor of Engineering and Applied Sciences at Harvard University and Vice President and co-founder of MLCommons (mlcommons.org), a nonprofit organization accelerating machine learning innovation. He oversees MLCommons Research and serves on its board of directors. Dr. Janapa Reddi co-led the development of the MLPerf benchmarks, which evaluate a wide range of ML systems from megawatt to microwatt scales. He also serves on the tinyML Foundation’s board of directors (https://tinyml.org), where he is shaping the future of edge AI technologies and academic-industry partnerships. His research leverages his expertise in computer architecture and applied machine learning to push the boundaries of mobile and edge computing and IoT advancements. Passionate about promoting equitable access to technology and education, he is the author, editor, and maintainer of the open-source Machine Learning Systems textbook (https://mlsysbook.ai), and he developed the Tiny Machine Learning educational series on edX, which has trained over 100,000 students globally in recent years. He holds Ph.D., M.S., and B.S. degrees from Harvard University, the University of Colorado at Boulder, and Santa Clara University, respectively. |
10:00 | Coffee Break |
10:20 | Session 4: LLMs and Systems for Machine Learning Session chair: Guru Venkataramani (George Washington University) |
Understanding Performance Implications of LLM Inference on CPUs Seonjin Na, Geonhwa Jeong (Georgia Institute of Technology); Byunghoon Ahn (University of California San Diego); Jeffrey Young, Tushar Krishna (Georgia Institute of Technology); Hyesoon Kim (Georgia Tech) |
|
Low-Bitwidth Floating Point Quantization for Efficient, High-Quality Diffusion Models Cheng Chen, Christina Giannoula, Andreas Moshovos (University of Toronto) |
|
Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models Chakshu Moar, Faraz Tahmasebi (University of California, Irvine); Michael Pellauer (NVIDIA); Hyoukjun Kwon (University of California, Irvine) |
|
Understanding The Performance and Estimating The Cost Of LLM Fine-Tuning Yuchen Xia (University of Michigan); Jiho Kim (Georgia Institute of Technology); Yuhan Chen, Haojie Ye (University of Michigan); Souvik Kundu (Intel Labs); Cong ”Callie” Hao (Georgia Institute of Technology); Nishil Talati (University of Michigan) |
|
Characterizing and Optimizing the End-to-End Performance of Multi-Agent Reinforcement Learning Systems Kailash gogineni, Yongsheng Mei (George Washington University); Karthikeya Gogineni (Independent); Peng Wei, Tian Lan, Guru Venkataramani (George Washington University) |
|
12:00 | Lunch |
13:20 | Session 5: Caches and Memory Session chair: Prashant Nair (University of British Columbia) |
Understanding Address Translation Scaling Behaviours Using Hardware Performance Counters Nick Lindsay, Abhishek Bhattacharjee (Yale University) |
|
Architectural Modeling and Benchmarking for Digital DRAM PIM Farzana Ahmed Siddique, Deyuan Guo, Zhenxing Fan, Mohammadhosein Gholamrezaei, Khyati Kiyawat, Morteza Baradaran, Alif Ahmed, Kyle Durrer, Abdullah T. Mughrabi, Hugo Abbot, Ethan Ermovick, Ashish Venkat, Kevin Skadron (University of Virginia) |
|
Enhanced System-Level Coherence for Heterogeneous Unified Memory Architectures Anoop Mysore Nataraja (University of Washington); Ricardo Fern´andez-Pascual, Alberto Ros (University of Murcia) |
|
Characterizing Emerging Page Replacement Policies for Memory-Intensive Applications Michael Wu (Yale University); Sibren Isaacman (Loyola University Maryland); Abhishek Bhattacharjee (Yale University) |
|
Kindle: A Comprehensive Framework for Exploring OS-Architecture Interplay in Hybrid Memory Systems Arun KP (Indian Institute of Technology Kanpur); Debadatta Mishra (IIT Kanpur, India) |
|
15:00 | Coffee Break |
15:20 | Session 6: GPUs and Heterogeneous Systems Session chair: Omesh Tickoo (Intel Labs) |
Characterizing CUDA and OpenMP Synchronization Primitives Brandon Burtchell, Martin Burtscher (Texas State University) |
|
Evaluating Performance and Energy Efficiency of Parallel Programming Models in Heterogeneous Computing Demirhan Sevim, Baturalp Bilgin, Ismail Akturk (Ozyegin University) |
|
Performance Impact of Removing Data Races from GPU Graph Analytics Programs Yiqian Liu, Avery VanAusdal, Martin Burtscher (Texas State University) |
|
16:20 | Closing and Best Paper Award |
Poster Session (16:20PM Monday)
Session chair: Lisa Wu Wills (Duke University)
CONFER: A Toolchain for Comparing Hardware Costs Among Number Formats Madi Gudin, Jesse Looney, Lillian Pentecost (Amherst College) |
EDAMAME: fEDerAted Memory-Adaptive siMulated lEarning Soyon Choi, Lillian Pentecost (Amherst College) |
MAccel-sim: A Multi-GPU Simulator for Architectural Exploration Christin David Bose, Cesar Avalos, Junrui Pan, Mahmoud Khairy, Timothy Rogers (Purdue University) |
LLaMa Profiling for Future Architecture Mansi Choudhary, Ceyu Xu, Lisa Wu Wills (Duke University) |
TPUSim Mansi Choudhary, Ceyu Xu, Lisa Wu Wills (Duke University) |
Closing the Observability Gap: Scaling Serverless Resources More Frequently Nima Nasiri, Nalin Munshi (University of British Columbia); Simon D Moser (IBM); Sathish Gopalakrishnan, Mohammad Shahrad (University of British Columbia) |