IISWC 2005 Home

IISWC-2008

September 14-16, 2008

Seattle, WA, USA

TUTORIALS

September 14

(Sunday)

Tutorial I (8 am - 12 noon, 20 minute coffee break at 9:50 am)

Title: Using the Pin Instrumentation Framework for Workload Characterization

-Presenter: Aamer Jaleel, Cristiano Pereira (Intel Corporation)

Tutorial II (1:30 pm - 5:30 pm, 20 minute coffee break at 3:20 pm)

Title: Transactional Memory for C/C++: Design, Implementation and Performance

-Presenters: Adam Welc, Yang Ni, and Ali-Reza Adl-Tabatabai (Intel Corporation)

Tutorial III (1:30 pm - 5:30 pm, 20 minute coffee break at 3:20 pm)

Title: Fast Simulation without Bogus Results

-Presenters: Tom Conte, Paul Bryan (Georgia Institute of Technology)

*** Tutorial II and III are parallel session

Abstracts and Bios:

Tutorial I: Using the Pin Instrumentation Framework for Workload Characterization

Abstract:
With several new emerging application domains, understanding the requirements of these applications is essential in designing future high performance processors. Such workload characterization and exploratory studies require fast and efficient techniques that can determine the behavior of these emerging workloads. This workshop will illustrate the use of Pin to conduct workload characterization and performance studies.

Pin is a dynamic instrumentation system provided by Intel (http://rogue.colorado.edu/Pin) that has become widely used throughout academia and industry. Pin allows code (C/C++) to be injected at arbitrary places in an executable while it is running. The injected code is used to observe the behavior of the program, and can be used to write a variety of workload characterization tools such as application profilers and trace generators. Pin provides a rich API that abstracts away the underlying instruction set idiosyncrasies and allows context information such as register contents to be passed to the injected code as parameters. Pin automatically saves and restores the registers that are overwritten by the injected code so the application continues to operate normally. Pin makes it easy to do studies on complex real-life applications, which makes it a useful tool for enabling workload characterization studies.

This tutorial targets researchers, students, and educators alike, from the novice Pin user to the expert Pinhead. The tutorial will provide a brief background on Pin and describe how to build simple Pin tools that can help in workload characterization.

Bios:
Aamer Jaleel is a Hardware Engineer at Intel. He received his Ph.D. in Computer Engineering from the University of Maryland, College Park in 2005. His current research focuses on workload characterization and memory system optimizations for high performance processors.

Cristiano Pereira is a Software and Hardware Engineer at Intel. He received his Ph.D. from the University of California, San Diego in 2007. His current research focuses on workload tracing and characterization, and debugging of multi-threaded programs.

Tutorial II: Transactional Memory for C/C++: Design, Implementation and Performance

Abstract:
The advent of multi-core processors in the mainstream computing market is forcing programmers to shift from writing sequential code to building concurrent applications. Today, concurrent programs are typically synchronized using locks. However, it is hard to write correct and scalable programs using locks. Fine-grained locks may not compose and are prone to deadlocks. Coarse-grained locks may not deliver the performance promised by multi-core processors. Recently, Transactional Memory (TM) has been proposed as an alternative to locks for multi-core programming. TM has attracted a lot of attention as a safer, more modular, and more scalable concurrency control mechanism.

This tutorial will provide a comprehensive overview of transactional memory, with detailed discussion of the state-of-the-art C/C++ STM developed at Intel. We will cover all important areas of TM research, from the language extensions, through STM design and implementation, to workload development and characterization. We will show how to extend C/C++ with transactional constructs and how these constructs alleviate some of the problems related to programming with locks. We will also present the fundamentals of implementing software transactional memory algorithms for C/C++ and show how to integrate these algorithms with the existing language and runtime features. Finally, we will share our experience from writing and running transactional C/C++ applications and address both programmability and performance implications of TM.

Bios:
Yang Ni is a Research Scientist in Intel's Programming Systems Lab. He has been working on programming languages for platforms from mobile devices to chip multi processors. His current research focuses on transactional memory. He is a major contributor to the Intel C/C++ TM compiler. Yang received his Ph.D. in Computer Science from Rutgers University.

Adam Welc is a Research Scientist in Intel's Programming Systems Lab. His work is in the area of programming language design and implementation, with specific interests in concurrency control, compiler and run-time system optimizations, transactional processing as well as architectural support for programming languages and applications. Adam received the Master of Science in Computer Science from Poznan University of Technology, Poland, in July 1999. He continued his graduate studies at Purdue University, receiving the Master of Science in Computer Science in May 2003, and the Ph.D. in Computer Science in March 2006.

Tutorial III: Fast Simulation without Bogus Results

Abstract:

Contemporary hardware design is driven by simulation. An invaluable tool for evaluating design tradeoffs, simulator complexity and workload size has made simulation become an increasingly time consuming endeavor. Since exhaustive simulation of workloads is prohibitively expensive, some researchers have attempted to lower this cost at their peril. It is still common for researchers to execute a number of arbitrary instructions during the evaluation of their technique, and obtain inaccurate or misleading results. In contrast, the application of statistical sampling techniques to hardware simulation is an effective technique to significantly reduce the costs of simulation, while still achieving high levels of accuracy.

This tutorial will provide a thorough background of statistical concepts and techniques commonly used with sampling. These concepts will be explained in the context of hardware simulation environments. In order to obtain accurate measurement, two types of bias must first be removed. Each of these types of bias and their importance will be discussed in detail. Various non-sampling bias removal techniques (warm-up methods) will be discussed including the Reverse State Reconstruction algorithm. Sampling bias removal techniques will also be discussed when developing a sampling regimen including Single-Pass Sampling Regimen Design algorithm.

Finally, this tutorial will provide a detailed checklist for researchers to use to more easily incorporate sampling into their own simulation environments. Since complexity is often cited as a reason that researchers do not use sampling in their own studies, a goal of this tutorial will be to help researchers overcome this hurdle.

Bios:
Paul Bryan is a research assistant from the Georgia Institute of Technology under the guidance of Tom Conte. His research interests include: sampled simulation, performance modeling, computer architecture, sampling and non-sampling bias removal techniques. He holds a M.S. and B. S. in Computer Engineering from North Carolina State University.

Tom Conte is Professor of Computer Science, College of Computing, Georgia Institute of Technology His research is in the areas of manycore/multicore architectures, microprocessor architectures, compiler code generation, architectural performance evaluation and embedded computer systems. Conte is the past chair of the ACM Special Interest Group on Microarchitecture (SIGMICRO), the past chair of the IEEE Computer Society Technical Committee on Microprogramming and Microarchitecture (TC-uARCH), and also a fellow of the IEEE. He was the editor in chief of the Journal of Instruction-Level Parallelism from 1997-2001 and 2002-2005. He is an associate editor of ACM Transactions on Embedded Computer Systems, ACM Transactions on Architecture and Compiler Optimization and IEEE Computer and IEEE Micro magazines. Tom Conte received his Ph.D. from the University of Illinois at Urbana Champaign in 1992.