IISWC 2022

November 6 – November 8, 2022

 Austin, Texas, USA


* Open Research Objects (ORO) : This badge signals that author-created digital objects used in the research (including data and code) are permanently archived in a public repository that assigns a global identifier and guarantees persistence, and are made available via standard open licenses that maximize artifact availability.

Sunday, November 6, 2022 - Tutorials
10am - 12pm CST Tutorial 1: Performance Engineering in the Public Cloud, Padma Apparao, Intel Corporation.
Abstract  Slides
1pm - 3pm CST Tutorial 2: Methods for characterizing workloads with Hardware accelerated memory page compression, Ravindran, Binuraj; Al-Fahad, Rakib; Liang, Dan; Chowdhury, Muktadir; Luo, Zhenlin, Intel Corporation.
Abstract  Slides
3:15pm - 5:15pm CST Tutorial 3: Sparse Weight Compression and Decompression for Intel AMX/TMUL to Improve Deep Learning Performance, Shamima Najnin Rajesh Poornachandran, Sreekanth V. Yalachigere Mona Minakshi, Anik Khan, Ofir Zafrir, Nilesh Jain, Md Faijul Amin, Guy Boudoukh, Tatyana Primak, Pallavi G, Intel Corporation.
Abstract  Slides
Monday, November 7, 2022
8:00 - 8:30 am CST Breakfast
8:30 - 8:45 am CST Opening Remarks
8:45 - 9:30 am CST Virtual Reality: The current state of the art and the opportunities, Amit Puntambekar, Meta
9:30 - 10:45 am CST Session 1: Microarchitecture/HW Performance Analysis
11:00 - 12:15 pm CST Session 2: HPC
1:00 - 2:30 pm CST Panel
2:45 - 4:25 pm CST Session 3: AI Systems
5:00 - 8:00 pm CST Social event: Austin River Cruise and Dinner
Tuesday, November 8, 2022
8:00 - 8:30 am CST Breakfast
8:30 - 8:45 am CST Opening Remarks
8:45 - 9:30 am CST Keynote 2: Overcoming the challenges when viewing oneAPI as a performance workload, Paul Petersen, Intel
9:30 - 10:45 am CST Session 4: Graph Neural Networks
11:00 - 12:15 pm CST Session 5: Graph Analytics and GPUs
1:30 - 3:15 pm CST Session 6: Mobile, Web, and Cloud
3:30 - 4:45 pm CST Session 7: AI Benchmarks & Characterization
4:45 - 5:00 pm CST Closing Remarks

Detailed Program

Monday, November 7, 2022
8:45 - 9:30 am CST Keynote 1: Virtual Reality: The current state of the art and the opportunities, Amit Puntambekar, Meta
Recent advances in computing technology, mobile computing, computer graphics and better understanding of human perceptual processes are making experiences provided by virtual reality devices incredibly realistic. Impressive new technologies like hand tracking, voice recognition, face tracking, mixed reality, driven by ever more sophisticated AI Models, are being incorporated into completely standalone devices making them significantly easier to use and accessible to all. These devices have the ability to transport the human mind into space going down a roller coaster or to Alaska watching the northern lights while their body is sitting on a couch in the comfort of their home. I think we are once again at an inflexion point in technology that will revolutionize the way humans communicate with each other using these new capabilities in VR devices just like the invention of the telephone over 120 years ago. In this talk I will explore some of the technologies underlying these devices and the opportunities they afford to the research community to advance VRs state of the art.
Amit is a Director of Engineering at Meta and currently leads platform engineering efforts in VR, which includes VR Operating System, VR Foundation and VR Ecosystem engineering teams. Prior to VR, Amit spent the last 7+ years at Meta in Video in Facebook, leading efforts on video encoding, video platform and machine learning for content understanding and recommendation systems. Prior to Meta, Amit co-founded a video processing company, which was acquired by Meta. He holds 10+ patents across Video, ML, Infra and Distributed Systems.
9:30 - 10:45 am CST
Session 1: Microarchitecture/HW Performance Analysis
Carol-Jean Wu (Meta AI/Arizona State University)
PInTE: Probabilistic Induction of Theft Evictions Slides
Cesar A Gomes, Xuesi Chen, Mark Hempstead (Tufts University)
GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation
Ondřej Sýkora, Phitchaya Mangpo Phothilimthana (Google Research), Charith Mendis (UIUC), Amir Yazdanbakhsh (Google Research)
UVM Discard: Eliminating Redundant Memory Transfers for Accelerators
Weixi Zhu (Rice University), Guilherme Cox, Jan Vesely, Mark Hairgrove (NVIDIA), Alan L. Cox, Scott Rixner (Rice University)
11:00 - 12:15 pm CST
Session 2: HPC
Mark Hempstead (Tufts University)
FPChecker: Floating-Point Exception Detection Tool and Benchmark for Parallel and Distributed HPC
Ignacio Laguna (Lawrence Livermore National Laboratory), Tanmay Tirpankar, Xinyi Li, Ganesh Gopalakrishnan (University of Utah)
Splash-4: A Modern Benchmark Suite with Lock-Free Constructs Slides
Eduardo José Gómez-Hernández, Juan M. Cebrian (University of Murcia), Stefanos Kaxiras (Uppsala University), Alberto Ros (University of Murcia)
Characterizing Molecular Dynamics Simulation on Commodity Platforms
Francesco Peverelli, Davide Conficconi (Politecnico di Milano, Italy), Davide B. Bartolini, Alberto Scolari (Huawei), Marco D. Santambrogio (Politecnico di Milano, Italy)
1:00 - 2:30 pm CST 25 Years of IISWC: Looking Back and Forward
Moderator: Lieven Eeckhout (Ghent University)
Panelists: John Carter (IBM), Lizy K. John (University of Texas at Austin), David Kaeli (Northeastern University), Vijay Janapa Reddi (Harvard University), Carole-Jean Wu (Meta), Neeraja J. Yadwadkar (University of Texas at Austin)
2:45 - 4:25 pm CST
Session 3: AI Systems
Ravi Iyer (Intel)
An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks
Kiran Seshadri, Berkin Akin (Google), James Laudon (Google Research), Ravi Narayanaswami (Google, Cruise), Amir Yazdanbakhsh (Google Research)
Accelerating Transformer Networks through Recomposing Softmax Layers Slides
Jaewan Choi, Hailong Li (Seoul National University), Byeongho Kim (Samsung Electronics), Seunghwan Hwang, Jung Ho Ahn (Seoul National University)
A Slice and Dice Approach to Accelerate Compound Sparse Attention on GPU Slides
Hailong Li, Jaewan Choi, Jung Ho Ahn (Seoul National University)
FedGPO: Heterogeneity-Aware Global Parameter Optimization for Efficient Federated Learning
Young Geun Kim (Korea University), Carole-Jean Wu (Meta AI / Arizona State University)
Tuesday, November 8, 2022
8:45 - 9:30 am CST
Keynote 2: Overcoming the challenges when viewing oneAPI as a performance workload, Paul Petersen, Intel (Slides)
The vision for oneAPI is to be an open, cross-architecture programming model that allows developers to use a single code base across multiple accelerator architectures. Delivering this vision requires the creation of open specifications, the creation of open-source projects providing implementations, encouraging the emergence of an open community, and delivering an instance of this as a specific product which can enable developers to fully utilize a hardware platform. In this context, I want to talk about a range of challenges and some methods by which we solved them as we looked at the combination of applications and the oneAPI runtime as a workload to optimize. Often the challenge was in how we could observe and understand execution behaviors to see if it was an expected workload characteristic or an example of overhead we could reduce.
Paul Petersen is a Fellow in Intel/SATG (Software and Advanced Technology Group), and oneAPI Architect. He received a Ph.D. in Computer Science from the University of Illinois in 1993. Starting at Kuck and Associates, Inc. (KAI) responsibility included enhancing the auto-parallelizing compiler (KAP) and the early definition and implementations of OpenMP. While at KAI, he developed the Assure line of parallelization/correctness products, for Fortran, C++ and Java. In 2000, Intel Corporation acquired KAI, and he joined the software tools group creating the Thread Checker products, which evolved into the Inspector and Advisor components of the Intel® Parallel Studio. Inspector uses dynamic binary instrumentation to detect memory and concurrency bugs, and Advisor uses similar techniques along with performance measurement and modeling to assist developers in transforming existing serial applications to be ready for parallel execution. The passion for software architecture grew to cover all of Parallel Studio XE and its components architecture. After a few years leading the software tools pathfinding with a focus on defining next generation features for parallel runtimes and software analysis tools, Paul returned to software architecture in his current role leading the oneAPI Tools Architecture team.
9:30 - 10:45 am CST
Session 4: Graph Neural Networks
Reetu Das (University of Michigan)
Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU Slides
Hanqiu Chen, Yahya Alhinai, Yihan Jiang (GaTech), Eunjee Na (KAIST), Cong (Callie) Hao (GaTech)
gSuite: A Flexible and Framework Independent Benchmark Suite for Graph Neural Network Inference on GPUs Slides
Taha Tekdoğan, Serkan Göktaş, Ayse Yilmazer-Metin (Istanbul Technical University)
Characterizing the Efficiency of Graph Neural Network Frameworks with a Magnifying Glass
Xin Huang (Texas State University), Jongryool Kim (SK hynix America), Brad Rees (NVIDIA), Chul-Ho Lee (Texas State University)
11:00 - 12:15 pm CST
Session 5: Graph Analytics and GPUs
Cristina Beldica (Intel)
Performance Characterization of AutoNUMA Memory Tiering on Graph Analytics Slides
Diego Moura (Federal University of Bahia), Daniel Mossé (University of Pittsburgh), Vinicius Petrucci (Micron)
Understanding the Power of Evolutionary Computation for GPU Code Optimization Slides
Jhe-Yu Liou (Arizona State University), Muaaz Awan, Steven Hofmeyr (Lawrence Berkeley National Laboratory), Carole-Jean Wu, Stephanie Forrest (Arizona State University)
The Implications of Page Size Management on Graph Analytics Slides
Aninda Manocha (Princeton University), Zi Yan (NVIDIA), Esin Tureci (Princeton University), Juan Luis Aragón (University of Murcia), David Nellans (NVIDIA), Margaret Martonosi (Princeton University)
1:30 - 3:15 pm CST
Session 6: Mobile, Web, and Cloud
Chris Hughes (Intel)
Revisiting Temporal Storage I/O Behaviors of Smartphone Applications: Analysis and Synthesis Slides
Qiang Zou (Southwest University), Bo Mao (Xiamen University)
How Far We’ve Come – A Characterization Study of Standalone WebAssembly Runtimes Slides
Wenwen Wang (University of Georgia)
SpotLake: Diverse Spot Instance Dataset Archive Service Slides
Sungjae Lee, Jaeil Hwang, Kyungyong Lee (Kookmin University)
Leaps and Bounds: Analyzing WebAssembly's Performance with a Focus on Bounds Checking Slides
Raven Szewczyk, Kim Stonehouse, Antonio Barbalace (University of Edinburgh, United Kingdom), Tom Spink (University of St Andrews, United Kingdom)
3:30 - 4:45 pm CST
Session 7: AI Benchmarks & Characterization
Chris Hughes (Intel)
Demystifying Map Space Exploration for NPUs
Sheng-Chun Kao (GaTech), Angshuman Parashar, Po-An Tsai (NVIDIA), Tushar Krishna (GaTech)
LongTail-Bench: A Benchmark Suite for Domain-Specific Operators in Deep Learning
Xiuhong Li (SenseTime Research & Shanghai AI Lab), Shengen Yan, Lijuan Jiang, Ping Xu (SenseTime Research), Jinming Ma (Shanghai AI Lab), Xingcheng Zhang (SenseTime Research & Shanghai AI Lab), Dahua Lin (The Chinese University of Hong Kong & Shanghai AI Lab)
Demystifying BERT: System Design Implications
Suchita Pati (University of Wisconsin-Madison), Shaizeen Aga, Nuwan Jayasena (AMD Research), Matthew D. Sinclair (University of Wisconsin-Madison and AMD Research)
4:45 - 5:00 pm CST Closing Remarks

Sponsored by: