IISWC 2020 Program

Tuesday, October, 27, 2020 - Workshops and Tutorials
9:00 - 12:00	Tutorial 1: Benchmarking and Accelerating Big Data Systems With RDMA, PMEM, AND NVME-SSD Xiaoyi Lu, Haiyang Shi, and Shashank Gugnani (Department of Computer Science and Engineering, The Ohio State University)
9:00 - 12:00	Workshop 1: Practice and Research of Commercial Systems and Application Domains Inspur (Introduction; Link to the workshop website)
14:00 - 17:00	Tutorial 2: Proxy Benchmarks for Reproducible Research Lizy John, Steven Flolid, Zachary Susskind (The University of Texas at Austin), Emily Shriver (Intel)

Tuesday, October, 27, 2020 - Workshops and Tutorials

9:00 - 12:00

Tutorial 1: Benchmarking and Accelerating Big Data Systems With RDMA, PMEM, AND NVME-SSD
Xiaoyi Lu, Haiyang Shi, and Shashank Gugnani (Department of Computer Science and Engineering, The Ohio State University)

9:00 - 12:00

Workshop 1: Practice and Research of Commercial Systems and Application Domains
Inspur (Introduction; Link to the workshop website)

14:00 - 17:00

Tutorial 2: Proxy Benchmarks for Reproducible Research
Lizy John, Steven Flolid, Zachary Susskind (The University of Texas at Austin), Emily Shriver (Intel)

Wednesday, October 28, 2020
9:00 - 9:10	Chair Welcome Lei Liu, Lixin Zhang (ICT CAS); Devesh Tiwari and David Kaeli (Northeastern University)
9:10 - 10:06	Session 1: Large-scale Systems Session Chairs: Dalit Naor (IBM); Andre Brinkmann (Johannes Gutenberg University Mainz)
10:10 - 11:00	Keynote 1: Using Workload Characterization to Drive Future Advances in Compiler and Runtime Systems - Vivek Sarkar (Georgia Tech) Session Chair: Devesh Tiwari (Northeastern University)
11:18 - 12:06	Session 2: Benchmarking, Evaluation, and Simulation Methodologies Session Chairs: José L. Abellán (Catholic University of Murcia); Zhibin Yu (Shenzhen Institutes of Advanced Technology)

Wednesday, October 28, 2020

9:00 - 9:10

Chair Welcome
Lei Liu, Lixin Zhang (ICT CAS); Devesh Tiwari and David Kaeli (Northeastern University)

9:10 - 10:06

Session 1: Large-scale Systems
Session Chairs: Dalit Naor (IBM); Andre Brinkmann (Johannes Gutenberg University Mainz)

10:10 - 11:00

Keynote 1: Using Workload Characterization to Drive Future Advances in Compiler and Runtime Systems - Vivek Sarkar (Georgia Tech)
Session Chair: Devesh Tiwari (Northeastern University)

11:18 - 12:06

Session 2: Benchmarking, Evaluation, and Simulation Methodologies
Session Chairs: José L. Abellán (Catholic University of Murcia); Zhibin Yu (Shenzhen Institutes of Advanced Technology)

Thursday, October 29, 2020
9:00 - 9:48	Session 3: Machine learning and systems Session Chairs: Jose Cano (University of Glasgow); Avani Wildani (Emory University)
10:00 - 10:50	Keynote 2: The Computing and Information Science Landscape: A look forward - Margaret Martonosi (NSF/Princeton) Session Chair: David Kaeli (Northeastern University)
11:00 - 11:56	Session 4: System Architecture and Applications Session Chairs: Chen Liu (Clarkson University); Janki Bhimani (Florida Int'l University)
11:56 - 12:00	Closing Remarks
12:00 - 12:15	Closing

Thursday, October 29, 2020

9:00 - 9:48

Session 3: Machine learning and systems
Session Chairs: Jose Cano (University of Glasgow); Avani Wildani (Emory University)

10:00 - 10:50

Keynote 2: The Computing and Information Science Landscape: A look forward - Margaret Martonosi (NSF/Princeton)
Session Chair: David Kaeli (Northeastern University)

11:00 - 11:56

Session 4: System Architecture and Applications
Session Chairs: Chen Liu (Clarkson University); Janki Bhimani (Florida Int'l University)

11:56 - 12:00

Closing Remarks

12:00 - 12:15

Closing

Tuesday, October, 27, 2020
9:00 - 12:00	Tutorial 1: Benchmarking and Accelerating Big Data Systems With RDMA, PMEM, AND NVME-SSD (morning slot) - Xiaoyi Lu, Haiyang Shi, and Shashank Gugnani (Department of Computer Science and Engineering, The Ohio State University)
The convergence of HPC, Big Data, and Deep Learning is becoming the next game-changing opportunity. Modern HPC systems and Cloud Computing platforms have been fueled with the advances in multi-/many-core architectures, Remote Direct Memory Access (RDMA) enabled high-speed networks, persistent memory (PMEM), and NVMe-SSDs. However, many Big Data systems and libraries (such as Hadoop, Spark, Flink, Memcached) have not embraced such technologies fully. Recent studies have shown that default designs of these components can not efficiently leverage the advanced features on modern clusters with RDMA, PMEM, and NVMe-SSD. In this tutorial, we will provide an in-depth overview of the architectures, programming models, features, and performance characteristics of RDMA networks, PMEM, and NVMe-SSD. We will examine the challenges in re-/co-designing communication and I/O components of Big Data systems and libraries with these emerging technologies. We will provide benchmark-level studies and system-level (like Hadoop/Spark/TensorFlow/Memcached) case studies to discuss how to efficiently use these new technologies for real applications.
9:00 - 12:00	Workshop 1: Practice and research of commercial systems and application domains (in parallel with tutorials) - Inspur
Introduction PDF can be accessed here. Please refer to the link for more detailed information.
14:00 - 17:00	Tutorial 2: Proxy Benchmarks for Reproducible Research (afternoon slot) - Lizy John, Steven Flolid, Zachary Susskind (The University of Texas at Austin), Emily Shriver (Intel)
Computer architecture research has largely employed detailed full-system simulation with real-world workloads, however, the very large simulation times taken by this methodology has started to prohibit good design space exploration. Our ongoing research has come up with successful techniques to characterize benchmarks, and synthesize or clone benchmarks into miniaturized code sequences with approximately the same performance and power behavior as the original workload. This tutorial will present the proxy generation methodology, proxies for SPEC CPU 2017 benchmarks, and proxies for Cassandra, MongoDB, and MySQL. It will also present SimPoints for SPEC CPU 2017 and their pinballs. The use of miniaturized proxies for reproducible research will be examined.

Tuesday, October, 27, 2020

9:00 - 12:00

Tutorial 1: Benchmarking and Accelerating Big Data Systems With RDMA, PMEM, AND NVME-SSD (morning slot) - Xiaoyi Lu, Haiyang Shi, and Shashank Gugnani (Department of Computer Science and Engineering, The Ohio State University)

The convergence of HPC, Big Data, and Deep Learning is becoming the next game-changing opportunity. Modern HPC systems and Cloud Computing platforms have been fueled with the advances in multi-/many-core architectures, Remote Direct Memory Access (RDMA) enabled high-speed networks, persistent memory (PMEM), and NVMe-SSDs. However, many Big Data systems and libraries (such as Hadoop, Spark, Flink, Memcached) have not embraced such technologies fully. Recent studies have shown that default designs of these components can not efficiently leverage the advanced features on modern clusters with RDMA, PMEM, and NVMe-SSD. In this tutorial, we will provide an in-depth overview of the architectures, programming models, features, and performance characteristics of RDMA networks, PMEM, and NVMe-SSD. We will examine the challenges in re-/co-designing communication and I/O components of Big Data systems and libraries with these emerging technologies. We will provide benchmark-level studies and system-level (like Hadoop/Spark/TensorFlow/Memcached) case studies to discuss how to efficiently use these new technologies for real applications.

9:00 - 12:00

Workshop 1: Practice and research of commercial systems and application domains (in parallel with tutorials) - Inspur

Introduction PDF can be accessed here.
Please refer to the link for more detailed information.

14:00 - 17:00

Tutorial 2: Proxy Benchmarks for Reproducible Research (afternoon slot) - Lizy John, Steven Flolid, Zachary Susskind (The University of Texas at Austin), Emily Shriver (Intel)

Computer architecture research has largely employed detailed full-system simulation with real-world workloads, however, the very large simulation times taken by this methodology has started to prohibit good design space exploration. Our ongoing research has come up with successful techniques to characterize benchmarks, and synthesize or clone benchmarks into miniaturized code sequences with approximately the same performance and power behavior as the original workload. This tutorial will present the proxy generation methodology, proxies for SPEC CPU 2017 benchmarks, and proxies for Cassandra, MongoDB, and MySQL. It will also present SimPoints for SPEC CPU 2017 and their pinballs. The use of miniaturized proxies for reproducible research will be examined.

Wednesday, October 28, 2020
9:10 - 10:06	Session 1: Large-scale Systems Dalit Naor (IBM) <dalit_at_il.ibm.com> Andre Brinkmann (Johannes Gutenberg University Mainz) <brinkman_at_uni-mainz.de>
9:10 - 9:18	Pocolo: Power Optimized Colocation in Power Constrained Environments Iyswarya Narayanan (Penn State/Facebook); Adithya Kumar, Anand Sivasubramaniam (Penn State) video (long) video (short) paper
9:18 - 9:26	HetSim: Evaluating Large-Scale Heterogeneous Systems using a Trace-driven, Synchronization and Dependency-Aware Framework Subhankar Pal (University of Michigan); Kuba Kaszyk (University of Edinburgh); Siying Feng (University of Michigan); Björn Franke, Murray Cole, Michael O'Boyle (University of Edinburgh); Trevor Mudge, Ronald G. Dreslinski (University of Michigan) video (long) video (short) paper
9:26 - 9:34	HPC-MixPBench: An HPC Benchmark Suite for Mixed Precision Analysis Konstantinos Parasyris, Ignacio Laguna, Harshitha Menon, Markus Schordan, Daniel Osei-Kuffuor, Giorgis Georgakoudis (Lawrence Livermore National Laboratory); Mike Lam (James Madison University); Tristan Vanderbruggen (Lawrence Livermore National Laboratory) video (long) video (short) paper
9:34 - 9:42	An In-Depth Analysis of Cloud Block Storage Workloads in Large-Scale Production Jinhong Li, Qiuping Wang, Patrick P. C. Lee (The Chinese University of Hong Kong); Chao Shi (Alibaba Group) video (long) video (short) paper
9:42 - 9:50	Characterizing the Scale-Up Performance of Microservices Using TeaStore Sriyash Caculo, Kanishka Lahiri (Advanced Micro Devices, Inc); Subramaniam Kalambur (PES University) video (long) video (short) paper
9:50 - 9:58	MATCH: An MPI Fault Tolerance Benchmark Suite Luanzheng Guo (University of California Merced); Giorgis Georgakoudis, Konstantinos Parasyris, Ignacio Laguna (LLNL); Dong Li (University of California Merced) video (long) video (short) paper
9:58 - 10:06	CPU Microarchitectural Performance Characterization of Cloud Video Transcoding Yuhan Chen, Jingyuan Zhu, Tanvir Ahmed Khan, Baris Kasikci (University of Michigan) video (long) video (short) paper
10:10 - 11:00	Keynote 1: Using Workload Characterization to Drive Future Advances in Compiler and Runtime Systems - Vivek Sarkar (Georgia Tech) Devesh Tiwari (Northeastern University) <d.tiwari_at_northeastern.edu>
There have been many notable advances in compiler and runtime technologies to efficiently map applications on to multiple generations of parallel hardware, ranging from symmetric multiprocessors to distributed-memory multiprocessors, and from homogeneous to heterogeneous computing with accelerators. However, for the most part, compilers and runtime systems still have little knowledge of the characteristics of the applications that they process. The benefits of profile-directed optimization, as well as a number of predictive techniques (including recent approaches based on deep learning), have only begun to scratch the surface of how compilers and runtime systems can adapt to the characteristics of the applications that they process. Domain-specific programming systems are notable in their use of domain characteristics by compiler and runtime systems, but they typically account for small fractions of entire workloads. In this talk, we make a case for using more comprehensive workload characterizations to drive future advances in compilers and runtime systems. We will draw from recent research experiences with 1) the MAESTRO analytical cost model that leverages compiler techniques to efficiently explore a large space of mappings of DNNs on spatial accelerators, 2) the use of data-aware code transformations to optimize data analytics kernels such as Word2Vec, and 3) the use of the Habanero-C runtime system to efficiently execute HPC applications that require large volumes of small irregular asynchronous data transfers. We envision a future in which workload characterizations are regularly used as first-class inputs in the design and implementation of compiler and runtime systems. Modern workloads like DNNs provide a great starting point for such a direction because their characteristics are well known ahead of time. However, we expect workload-aware compiler and runtime techniques to also be beneficial for more general-purpose programs in the future.
Vivek Sarkar is Chair of the School of Computer Science and the Stephen Fleming Chair for Telecommunications in the College of Computing at Georgia Institute of Technology. He conducts research in multiple aspects of programmability and productivity in parallel computing, including programming languages, compilers, runtime systems, and debuggers for parallel, heterogeneous, and high-performance computer systems. Sarkar started his career in IBM Research after obtaining his Ph.D. from Stanford University, supervised by John Hennessy. His research projects at IBM include the PTRAN automatic parallelization system led by Fran Allen, the ASTI optimizer for IBM’s XL Fortran product compilers, the open-source Jikes Research Virtual Machine for the Java language, and the X10 programming language developed in the DARPA HPCS program. He was a member of the IBM Academy of Technology during 1995-2007. After moving to academia, Sarkar has mentored over 30 Ph.D. students and postdoctoral researchers in the Habanero Extreme Scale Software Research Laboratory, first at Rice University since 2007, and now at Georgia Tech since 2017. Researchers in his lab have developed the Habanero-C/C++ and Habanero-Java programming systems for parallel, heterogeneous, and distributed platforms. While at Rice, Sarkar was the E.D. Butcher Chair in Engineering, served as Chair of the Department of Computer Science, created a new sophomore-level course on the fundamentals of parallel programming, as well as a three-course Coursera specialization on parallel, concurrent, and distributed programming. Sarkar is an ACM Fellow and an IEEE Fellow. He has been serving as a member of the US Department of Energy’s Advanced Scientific Computing Advisory Committee (ASCAC) since 2009, and on CRA’s Board of Directors since 2015.
11:18 - 12:06	Session 2: Benchmarking, Evaluation, and Simulation Methodologies José L. Abellán (Catholic University of Murcia) <jlabellan_at_ucam.edu> Zhibin Yu (Shenzhen Institutes of Advanced Technology) <zb.yu_at_siat.ac.cn>
11:18 - 11:26	A Rigorous Benchmarking and Performance Analysis Methodology for Python Workloads Arthur Crapé, Lieven Eeckhout (Ghent University) video (long) video (short) paper
11:26 - 11:34	Reliability Modeling of NISQ-Era Quantum Computers Ji Liu, Huiyang Zhou (North Carolina State University) video (long) video (short) paper
11:34 - 11:42	CPI for Runtime Performance Measurement: The Good, the Bad, and the Ugly Li Yi, Cong Li (Intel Corporation); Jianmei Guo (Alibaba Cloud) video (long) video (short) paper
11:42 - 11:50	High Frequency Performance Monitoring via Architectural Event Measurement Chutitep Woralert (Clarkson University); James Bruska (Independent Researcher); Chen Liu (Clarkson University); Lok Yan (Air Force Research Lab) video (long) video (short) paper
11:50 - 11:58	Port or Shim? Stress Testing Application Performance on Intel SGX Aisha Hasan, Ryan Riley (Carnegie Mellon University in Qatar); Dmitry Ponomarev (Binghamton University) video (long) video (short) paper
11:58 - 12:06	Characterizing the Impact of Big Data Workloads on Cache Replacement Policies Alexandre Valentin Jamet (Barcelona Supercomputing Center); Lluc Alavarez (Barcelona Supercomputing Center / Universitat Politecnica de Catalunya); Marc Casas (Barcelona Supercomputing Center); Daniel A. Jiménez (Texas A&M University) video (long) video (short) paper

Wednesday, October 28, 2020

9:10 - 10:06

Session 1: Large-scale Systems

Dalit Naor (IBM) <dalit_at_il.ibm.com>
Andre Brinkmann (Johannes Gutenberg University Mainz) <brinkman_at_uni-mainz.de>

9:10 - 9:18

Pocolo: Power Optimized Colocation in Power Constrained Environments

Iyswarya Narayanan (Penn State/Facebook); Adithya Kumar, Anand Sivasubramaniam (Penn State)

video (long) video (short) paper

9:18 - 9:26

HetSim: Evaluating Large-Scale Heterogeneous Systems using a Trace-driven, Synchronization and Dependency-Aware Framework

Subhankar Pal (University of Michigan); Kuba Kaszyk (University of Edinburgh); Siying Feng (University of Michigan); Björn Franke, Murray Cole, Michael O'Boyle (University of Edinburgh); Trevor Mudge, Ronald G. Dreslinski (University of Michigan)

video (long) video (short) paper

9:26 - 9:34

HPC-MixPBench: An HPC Benchmark Suite for Mixed Precision Analysis

Konstantinos Parasyris, Ignacio Laguna, Harshitha Menon, Markus Schordan, Daniel Osei-Kuffuor, Giorgis Georgakoudis (Lawrence Livermore National Laboratory); Mike Lam (James Madison University); Tristan Vanderbruggen (Lawrence Livermore National Laboratory)

video (long) video (short) paper

9:34 - 9:42

An In-Depth Analysis of Cloud Block Storage Workloads in Large-Scale Production

Jinhong Li, Qiuping Wang, Patrick P. C. Lee (The Chinese University of Hong Kong); Chao Shi (Alibaba Group)

video (long) video (short) paper

9:42 - 9:50

Characterizing the Scale-Up Performance of Microservices Using TeaStore

Sriyash Caculo, Kanishka Lahiri (Advanced Micro Devices, Inc); Subramaniam Kalambur (PES University)

video (long) video (short) paper

9:50 - 9:58

MATCH: An MPI Fault Tolerance Benchmark Suite

Luanzheng Guo (University of California Merced); Giorgis Georgakoudis, Konstantinos Parasyris, Ignacio Laguna (LLNL); Dong Li (University of California Merced)

video (long) video (short) paper

9:58 - 10:06

CPU Microarchitectural Performance Characterization of Cloud Video Transcoding

Yuhan Chen, Jingyuan Zhu, Tanvir Ahmed Khan, Baris Kasikci (University of Michigan)

video (long) video (short) paper

10:10 - 11:00

Keynote 1: Using Workload Characterization to Drive Future Advances in Compiler and Runtime Systems - Vivek Sarkar (Georgia Tech)

Devesh Tiwari (Northeastern University) <d.tiwari_at_northeastern.edu>

There have been many notable advances in compiler and runtime technologies to efficiently map applications on to multiple generations of parallel hardware, ranging from symmetric multiprocessors to distributed-memory multiprocessors, and from homogeneous to heterogeneous computing with accelerators. However, for the most part, compilers and runtime systems still have little knowledge of the characteristics of the applications that they process. The benefits of profile-directed optimization, as well as a number of predictive techniques (including recent approaches based on deep learning), have only begun to scratch the surface of how compilers and runtime systems can adapt to the characteristics of the applications that they process. Domain-specific programming systems are notable in their use of domain characteristics by compiler and runtime systems, but they typically account for small fractions of entire workloads.

In this talk, we make a case for using more comprehensive workload characterizations to drive future advances in compilers and runtime systems. We will draw from recent research experiences with 1) the MAESTRO analytical cost model that leverages compiler techniques to efficiently explore a large space of mappings of DNNs on spatial accelerators, 2) the use of data-aware code transformations to optimize data analytics kernels such as Word2Vec, and 3) the use of the Habanero-C runtime system to efficiently execute HPC applications that require large volumes of small irregular asynchronous data transfers. We envision a future in which workload characterizations are regularly used as first-class inputs in the design and implementation of compiler and runtime systems. Modern workloads like DNNs provide a great starting point for such a direction because their characteristics are well known ahead of time. However, we expect workload-aware compiler and runtime techniques to also be beneficial for more general-purpose programs in the future.

Vivek Sarkar is Chair of the School of Computer Science and the Stephen Fleming Chair for Telecommunications in the College of Computing at Georgia Institute of Technology. He conducts research in multiple aspects of programmability and productivity in parallel computing, including programming languages, compilers, runtime systems, and debuggers for parallel, heterogeneous, and high-performance computer systems.

Sarkar started his career in IBM Research after obtaining his Ph.D. from Stanford University, supervised by John Hennessy. His research projects at IBM include the PTRAN automatic parallelization system led by Fran Allen, the ASTI optimizer for IBM’s XL Fortran product compilers, the open-source Jikes Research Virtual Machine for the Java language, and the X10 programming language developed in the DARPA HPCS program. He was a member of the IBM Academy of Technology during 1995-2007. After moving to academia, Sarkar has mentored over 30 Ph.D. students and postdoctoral researchers in the Habanero Extreme Scale Software Research Laboratory, first at Rice University since 2007, and now at Georgia Tech since 2017. Researchers in his lab have developed the Habanero-C/C++ and Habanero-Java programming systems for parallel, heterogeneous, and distributed platforms. While at Rice, Sarkar was the E.D. Butcher Chair in Engineering, served as Chair of the Department of Computer Science, created a new sophomore-level course on the fundamentals of parallel programming, as well as a three-course Coursera specialization on parallel, concurrent, and distributed programming.

Sarkar is an ACM Fellow and an IEEE Fellow. He has been serving as a member of the US Department of Energy’s Advanced Scientific Computing Advisory Committee (ASCAC) since 2009, and on CRA’s Board of Directors since 2015.

11:18 - 12:06

Session 2: Benchmarking, Evaluation, and Simulation Methodologies

José L. Abellán (Catholic University of Murcia) <jlabellan_at_ucam.edu>
Zhibin Yu (Shenzhen Institutes of Advanced Technology) <zb.yu_at_siat.ac.cn>

11:18 - 11:26

A Rigorous Benchmarking and Performance Analysis Methodology for Python Workloads

Arthur Crapé, Lieven Eeckhout (Ghent University)

video (long) video (short) paper

11:26 - 11:34

Reliability Modeling of NISQ-Era Quantum Computers

Ji Liu, Huiyang Zhou (North Carolina State University)

video (long) video (short) paper

11:34 - 11:42

CPI for Runtime Performance Measurement: The Good, the Bad, and the Ugly

Li Yi, Cong Li (Intel Corporation); Jianmei Guo (Alibaba Cloud)

video (long) video (short) paper

11:42 - 11:50

High Frequency Performance Monitoring via Architectural Event Measurement

Chutitep Woralert (Clarkson University); James Bruska (Independent Researcher); Chen Liu (Clarkson University); Lok Yan (Air Force Research Lab)

video (long) video (short) paper

11:50 - 11:58

Port or Shim? Stress Testing Application Performance on Intel SGX

Aisha Hasan, Ryan Riley (Carnegie Mellon University in Qatar); Dmitry Ponomarev (Binghamton University)

video (long) video (short) paper

11:58 - 12:06

Characterizing the Impact of Big Data Workloads on Cache Replacement Policies

Alexandre Valentin Jamet (Barcelona Supercomputing Center); Lluc Alavarez (Barcelona Supercomputing Center / Universitat Politecnica de Catalunya); Marc Casas (Barcelona Supercomputing Center); Daniel A. Jiménez (Texas A&M University)

video (long) video (short) paper

Thursday, October 29, 2020
9:00 - 9:48	Session 3: Machine learning and systems Jose Cano (University of Glasgow) <jose.canoreyes_at_glasgow.ac.uk> Avani Wildani (Emory University) <avani_at_mathcs.emory.edu>
9:00 - 9:08	AI on the Edge: Characterizing AI-based IoT Applications Using Specialized Edge Architectures Qianlin Liang, Prashant Shenoy, David Irwin (University of Massachusetts, Amherst) video (long) video (short) paper
9:08 - 9:16	Cross-Stack Workload Characterization of Deep Recommendation Systems Samuel Hsia (Harvard University); Udit Gupta (Harvard University / Facebook); Mark Wilkening (Harvard University); Carole-Jean Wu (Facebook); Gu-Yeon Wei, David Brooks (Harvard University) video (long) video (short) paper
9:16 - 9:24	A Case for Generalizable DNN Cost Models for Mobile Devices Vinod Ganesan, Surya Selvam (Indian Institute of Technology Madras); Sanchari Sen (Purdue University); Pratyush Kumar (Indian Institute of Technology Madras); Anand Raghunathan (Purdue University) video (long) video (short) paper
9:24 - 9:32	Empirical Analysis and Modeling of ComputeTimes of CNN Operations on AWS Cloud Ubaid Ullah Hafeez, Anshul Gandhi (Stony Brook University) video (long) video (short) paper
9:32 - 9:40	A Sparse Tensor Benchmark Suite for CPUs and GPUs Jiajia Li (Pacific Northwest National Laboratory); Mahesh Lakshminarasimhan (University of Utah); Xiaolong Wu (Purdue University); Ang Li (Pacific Northwest National Laboratory); Catherine Olschanowsky (Boise State University); Kevin Barker (Pacific Northwest National Laboratory) video (long) video (short) paper
9:40 - 9:48	Demystifying Power and Performance Bottlenecks in Autonomous Driving Systems Pedro Henrique Exenberger Becker, José Maria Arnau, Antonio González (Universitat Politècnica de Catalunya) video (long) video (short) paper
10:00 - 10:50	Keynote 2: The Computing and Information Science Landscape: A look forward - Margaret Martonosi (NSF/Princeton) David Kaeli (Northeastern University) <kaeli_at_ece.neu.edu>
The fields of computer and information science and engineering (CISE) are central to nearly all of society’s needs, opportunities, and challenges. The US National Science Foundation (NSF) was created 70 years ago with a broad mission to promote the progress of science and to catalyze societal and economic benefits. NSF, largely through its CISE directorate which has an annual budget of more than $1B, accounts for over 85% of federally-funded, academic, fundamental computer science research in the US. My talk will give an overview of NSF/CISE research, education, diversity and inclusion efforts, and research infrastructure programs. I will relate them to the technical and societal trends and topics that will impact their future trajectory. I will particularly highlight opportunity areas most in need of the engagement and insights from IISWC researchers going forward.
Margaret Martonosi is the US National Science Foundation's (NSF) Assistant Director for Computer and information Science and Engineering (CISE). With an annual budget of more than $1B, the CISE directorate at NSF has the mission to uphold the Nation's leadership in scientific discovery and engineering innovation through its support of fundamental research and education in computer and information science and engineering as well as transformative advances in research cyberinfrastructure. While at NSF, Dr. Martonosi is on leave from Princeton University where she is the Hugh Trumbull Adams '35 Professor of Computer Science. Dr. Martonosi's research interests are in computer architecture and hardware-software interface issues in both classical and quantum computing systems. Her work has included the widely-used Wattch power modeling tool and the Princeton ZebraNet mobile sensor network project for the design and real-world deployment of zebra tracking collars in Kenya. Dr. Martonosi is an elected member of the American Academy of Arts and Sciences, and a Fellow of the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE). Her papers have received numerous long-term impact awards from the computer architecture and mobile computing communities. In addition, she has earned the 2019 SIGARCH Alan D. Berenbaum Distinguished Service Award, the 2018 IEEE Computer Society Technical Achievement Award, and the 2010 Princeton University Graduate Mentoring Award, among other honors.
11:00 - 11:56	Session 4: System Architecture and Applications Chen Liu (Clarkson University) <cliu_at_clarkson.edu> Janki Bhimani (Florida Int'l University) <jbhimani_at_fiu.edu>
11:00 - 11:08	Evaluation of Graph Analytics Frameworks Using the GAP Benchmark Suite Ariful Azad (Indiana University); Mohsen Mahmoudi Aznaveh (Texas A&M University); Scott Beamer (University of California Santa Cruz); Mark Blanco (Carnegie Mellon University); Jinhao Chen (Texas A&M University); Luke D'Alessandro (Indiana University); Roshan Dathathri (The University of Texas at Austin); Tim Davis (Texas A&M University); Kevin Deweese (University of Washington); Jesun Firoz (Pacific Northwest National Laboratory); Henry A Gabb (Intel Corporation); Gurbinder Gill (The University of Texas at Austin); Balint Hegyi (Budapest University of Technology and Economics); Scott Kolodziej (Texas A&M University); Tze Meng Low (Carnegie Mellon University); Andrew Lumsdaine (University of Washington); Tugsbayasgalan Manlaibaatar (Massachusetts Institute of Technology); Timothy G Mattson (Intel Corporation); Scott McMillan (Carnegie Mellon University); Ramesh Peri (Intel Corporation); Keshav Pingali (The University of Texas at Austin); Upasana Sridhar (Carnegie Mellon University); Gabor Szarnyas (Budapest University of Technology and Economics); Yunming Zhang (Massachusetts Institute of Technology); Yongzhe Zhang (The Graduate University for Advanced Studies, SOKENDAI) video (long) video (short) paper
11:08 - 11:16	A Study of APIs for Graph Analytics Workloads Hochan Lee (University of Texas at Austin); David Wong (Intel Corporation); Loc Hoang, Roshan Dathathri, Gurbinder Gill, Vishwesh Jatala (University of Texas at Austin); David Kuck (Intel Corporation); Keshav Pingali (University of Texas at Austin) video (long) video (short) paper
11:16 - 11:24	Vertex Reordering for Real-world Graphs and Applications: An Empirical Evaluation Reet Barik (Washington State University); Marco Minutoli, Mahantesh Halappanavar, Nathan Tallent (Pacific Northwest National Laboratory); Ananth Kalyanaraman (Washington State University) video (long) video (short) paper
11:24 - 11:32	Scalable and Fast Lazy Persistency on GPUs Ardhi Wiratama Baskara Yudha (University of Central Florida); Keiji Kimura (Waseda University); Huiyang Zhou (North Carolina State University); Yan Solihin (University of Central Florida) video (long) video (short) paper
11:32 - 11:40	Accelerating Number Theoretic Transformations for Bootstrappable Homomorphic Encryption on GPUs Sangpyo Kim, Wonkyung Jung, Jaiyoung Park, Jung Ho Ahn (Seoul National University) video (long) video (short) paper
11:40 - 11:48	Reconfigurable Accelerator Compute Hierarchy: A Case Study Using Content-Based Image Retrieval Nazanin Farahpour, Yuchen Hao (University of California Los Angeles); Zhenman Fang (Simon Fraser University); Glenn Reinman (University of California Los Angeles) video (long) video (short) paper
11:48 - 11:56	Selective Event Processing for Energy Efficient Mobile Gaming with SNIP Prasanna Venkatesh Rengasamy, Haibo Zhang, Shulin Zhao, Anand Sivasubramaniam, Mahmut Kandemir, Chita R Das (Penn State) video (long) video (short) paper

Thursday, October 29, 2020

9:00 - 9:48

Session 3: Machine learning and systems

Jose Cano (University of Glasgow) <jose.canoreyes_at_glasgow.ac.uk>
Avani Wildani (Emory University) <avani_at_mathcs.emory.edu>

9:00 - 9:08

AI on the Edge: Characterizing AI-based IoT Applications Using Specialized Edge Architectures

Qianlin Liang, Prashant Shenoy, David Irwin (University of Massachusetts, Amherst)

video (long) video (short) paper

9:08 - 9:16

Cross-Stack Workload Characterization of Deep Recommendation Systems

Samuel Hsia (Harvard University); Udit Gupta (Harvard University / Facebook); Mark Wilkening (Harvard University); Carole-Jean Wu (Facebook); Gu-Yeon Wei, David Brooks (Harvard University)

video (long) video (short) paper

9:16 - 9:24

A Case for Generalizable DNN Cost Models for Mobile Devices

Vinod Ganesan, Surya Selvam (Indian Institute of Technology Madras); Sanchari Sen (Purdue University); Pratyush Kumar (Indian Institute of Technology Madras); Anand Raghunathan (Purdue University)

video (long) video (short) paper

9:24 - 9:32

Empirical Analysis and Modeling of ComputeTimes of CNN Operations on AWS Cloud

Ubaid Ullah Hafeez, Anshul Gandhi (Stony Brook University)

video (long) video (short) paper

9:32 - 9:40

A Sparse Tensor Benchmark Suite for CPUs and GPUs

Jiajia Li (Pacific Northwest National Laboratory); Mahesh Lakshminarasimhan (University of Utah); Xiaolong Wu (Purdue University); Ang Li (Pacific Northwest National Laboratory); Catherine Olschanowsky (Boise State University); Kevin Barker (Pacific Northwest National Laboratory)

video (long) video (short) paper

9:40 - 9:48

Demystifying Power and Performance Bottlenecks in Autonomous Driving Systems

Pedro Henrique Exenberger Becker, José Maria Arnau, Antonio González (Universitat Politècnica de Catalunya)

video (long) video (short) paper

10:00 - 10:50

Keynote 2: The Computing and Information Science Landscape: A look forward - Margaret Martonosi (NSF/Princeton)

David Kaeli (Northeastern University) <kaeli_at_ece.neu.edu>

The fields of computer and information science and engineering (CISE) are central to nearly all of society’s needs, opportunities, and challenges. The US National Science Foundation (NSF) was created 70 years ago with a broad mission to promote the progress of science and to catalyze societal and economic benefits. NSF, largely through its CISE directorate which has an annual budget of more than $1B, accounts for over 85% of federally-funded, academic, fundamental computer science research in the US. My talk will give an overview of NSF/CISE research, education, diversity and inclusion efforts, and research infrastructure programs. I will relate them to the technical and societal trends and topics that will impact their future trajectory. I will particularly highlight opportunity areas most in need of the engagement and insights from IISWC researchers going forward.

Margaret Martonosi is the US National Science Foundation's (NSF) Assistant Director for Computer and information Science and Engineering (CISE). With an annual budget of more than $1B, the CISE directorate at NSF has the mission to uphold the Nation's leadership in scientific discovery and engineering innovation through its support of fundamental research and education in computer and information science and engineering as well as transformative advances in research cyberinfrastructure. While at NSF, Dr. Martonosi is on leave from Princeton University where she is the Hugh Trumbull Adams '35 Professor of Computer Science.

Dr. Martonosi's research interests are in computer architecture and hardware-software interface issues in both classical and quantum computing systems. Her work has included the widely-used Wattch power modeling tool and the Princeton ZebraNet mobile sensor network project for the design and real-world deployment of zebra tracking collars in Kenya. Dr. Martonosi is an elected member of the American Academy of Arts and Sciences, and a Fellow of the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE). Her papers have received numerous long-term impact awards from the computer architecture and mobile computing communities. In addition, she has earned the 2019 SIGARCH Alan D. Berenbaum Distinguished Service Award, the 2018 IEEE Computer Society Technical Achievement Award, and the 2010 Princeton University Graduate Mentoring Award, among other honors.

11:00 - 11:56

Session 4: System Architecture and Applications

Chen Liu (Clarkson University) <cliu_at_clarkson.edu>
Janki Bhimani (Florida Int'l University) <jbhimani_at_fiu.edu>

11:00 - 11:08

Evaluation of Graph Analytics Frameworks Using the GAP Benchmark Suite

Ariful Azad (Indiana University); Mohsen Mahmoudi Aznaveh (Texas A&M University); Scott Beamer (University of California Santa Cruz); Mark Blanco (Carnegie Mellon University); Jinhao Chen (Texas A&M University); Luke D'Alessandro (Indiana University); Roshan Dathathri (The University of Texas at Austin); Tim Davis (Texas A&M University); Kevin Deweese (University of Washington); Jesun Firoz (Pacific Northwest National Laboratory); Henry A Gabb (Intel Corporation); Gurbinder Gill (The University of Texas at Austin); Balint Hegyi (Budapest University of Technology and Economics); Scott Kolodziej (Texas A&M University); Tze Meng Low (Carnegie Mellon University); Andrew Lumsdaine (University of Washington); Tugsbayasgalan Manlaibaatar (Massachusetts Institute of Technology); Timothy G Mattson (Intel Corporation); Scott McMillan (Carnegie Mellon University); Ramesh Peri (Intel Corporation); Keshav Pingali (The University of Texas at Austin); Upasana Sridhar (Carnegie Mellon University); Gabor Szarnyas (Budapest University of Technology and Economics); Yunming Zhang (Massachusetts Institute of Technology); Yongzhe Zhang (The Graduate University for Advanced Studies, SOKENDAI)

video (long) video (short) paper

11:08 - 11:16

A Study of APIs for Graph Analytics Workloads

Hochan Lee (University of Texas at Austin); David Wong (Intel Corporation); Loc Hoang, Roshan Dathathri, Gurbinder Gill, Vishwesh Jatala (University of Texas at Austin); David Kuck (Intel Corporation); Keshav Pingali (University of Texas at Austin)

video (long) video (short) paper

11:16 - 11:24

Vertex Reordering for Real-world Graphs and Applications: An Empirical Evaluation

Reet Barik (Washington State University); Marco Minutoli, Mahantesh Halappanavar, Nathan Tallent (Pacific Northwest National Laboratory); Ananth Kalyanaraman (Washington State University)

video (long) video (short) paper

11:24 - 11:32

Scalable and Fast Lazy Persistency on GPUs

Ardhi Wiratama Baskara Yudha (University of Central Florida); Keiji Kimura (Waseda University); Huiyang Zhou (North Carolina State University); Yan Solihin (University of Central Florida)

video (long) video (short) paper

11:32 - 11:40

Accelerating Number Theoretic Transformations for Bootstrappable Homomorphic Encryption on GPUs

Sangpyo Kim, Wonkyung Jung, Jaiyoung Park, Jung Ho Ahn (Seoul National University)

video (long) video (short) paper

11:40 - 11:48

Reconfigurable Accelerator Compute Hierarchy: A Case Study Using Content-Based Image Retrieval

Nazanin Farahpour, Yuchen Hao (University of California Los Angeles); Zhenman Fang (Simon Fraser University); Glenn Reinman (University of California Los Angeles)

video (long) video (short) paper

11:48 - 11:56

Selective Event Processing for Energy Efficient Mobile Gaming with SNIP

Prasanna Venkatesh Rengasamy, Haibo Zhang, Shulin Zhao, Anand Sivasubramaniam, Mahmut Kandemir, Chita R Das (Penn State)

video (long) video (short) paper