Tutorials

The 2011 International Conference on High Performance Computing & Simulation

(HPCS 2011)

In Conjunction With

The International Wireless Communications and Mobile Computing Conference (IWCMC 2011)

July 4 – 8, 2011

Bahcesehir University

Istanbul, Turkey

HPCS 2011 TUTORIALS

T1: Distributed Software Transactional Memories: Foundations, Algorithms and Tools

Maria Couceiro, Paolo Romano, Luís Rodrigues

INESC-ID Lisbon, Portugal

Click to get notes. (3.0 Hours)


T2: Parallel and Distributed Simulation from Many Cores to the Public Cloud

Gabriele D'Angelo

University of Bologna, Italy

Click to get notes. (2.0 Hours)


T3: Parallel Programming with Cilk and Array Notation using the Intel Compiler

Levent Akyil and Farhana Aleen

Intel Corporation, Germany

Click to get notes. (2.5 Hours)


T4: Distributed Data Processing using Hadoop

Rajiv Chittajallu

Yahoo! Research, California, USA

Click to get notes. (2.0 Hours)


T5: Porting Applications with GridWay

Ismael Marín

dsa-research.org, Universidad Complutense de Madrid, Spain

Click to get notes. (2.5 Hours)

T1: Distributed Software Transactional Memories: Foundations, Algorithms and Tools (3.0 Hours)

Maria Couceiro, Paolo Romano, Luís Rodrigues

INESC-ID Lisbon, Portugal

BRIEF TUTORIAL DESCRIPTION

Parallel programming (PP) used to be an area once confined to scientific and high-performance computing applications. However, with the proliferation of multicore processors, parallel programming has definitely become a mainstream concern. Transactional Memories (TMs) answer the need to find a better programming model for PP, capable of boosting developers' productivity and allowing ordinary programmers to unleash the power of parallel and distributed architectures avoiding the pitfalls of manual, lock based synchronization. Distributed TMs (DTMs) represent a novel and fast growing evolution of the research area on TMs. DTMs enrich the traditional TM model to breach the boundaries of a single machine and transparently leverage the resources of commodity, shared-nothing clusters and achieve higher scalability and dependability levels. This research topic represents, in some sense, the confluence of the research areas on TM, distributed shared memory (DSM) and database replication. Interestingly, the currently available DTMs have shown promising results, highlighting how the reliance on the atomic transaction abstraction allows avoiding the well-known performance limitations of classical DSM systems, while providing strong consistency guarantees and scalability up to hundreds of nodes. These features, combined with the simple and familiar interface of DSM systems, make DTMs an attractive candidate to become the reference programming paradigm for large scale cloud computing platforms, whose popularity has been growing at an incredibly rapid pace in recent years. This tutorial starts by overviewing the state of the art in the area of Transactional Memories and then focuses on the area of Distributed Transactional Memories, critically analyzing existing algorithms, platforms and tools, highlighting strong and weak points of current solutions and identifying possible directions for future research.

T2: Parallel and Distributed Simulation from Many Cores to the Public Cloud (2.0 Hours)

Gabriele D'Angelo

University of Bologna, Italy

BRIEF TUTORIAL DESCRIPTION

In the last decades, the systems to be studied have become more and more complex. For example, in computer networks some very sophisticated protocols have been proposed and many networks have a huge number of nodes. The performance evaluation of such kind of systems is often based on discrete event simulation. Given these requirements and the large diffusion of parallel/distributed systems, we would expect Parallel And Distributed Simulation (PADS) to gain massive popularity: this is not the case. Are the PADS techniques ready for prime time after all the research work that has been done? Many simulation developers are unwilling to dismiss the "old" (sequential) tools and to switch to modern ones, despite the very strong demand for scalability and speed. What is missing? In these days two main changes are revolutionizing the execution architectures used to run simulations: on the bottom level the processors (CPU) are gaining more and more cores while on the high level we need to cope with virtual resources: "everything as a service" in a Public Cloud infrastructure. Given the magnitude of these changes, what is going to happen to simulation? What are the (many) limits of current approaches, technologies and tools? Is it possible to finally find a solution to some of the many problems of PADS while broadening its scope? In this tutorial we aim to introduce all the basic aspects of these subjects and to discuss the main drawbacks of the current approaches. The main aim of the tutorial is to foster discussion and to increase the knowledge of some now undervalued technologies. The last part of the tutorial will be about our practical experience in the development of the ARTÌ simulation middleware and in the proposal of a new paradigm for adaptive distributed simulation (called GAIA) that could be able to tackle with some of the issues described above. The tutorial will conclude with some examples derived from our experience in the performance evaluation of complex systems. Simulation topics should be very interesting for many of the HPCS 2011 attendees who may appreciate to increase their awareness on less known aspects of simulation and its perspectives. The main goal of this tutorial is to offer a clear and fresh outline on the state of the art of the simulation techniques with respect to the last changes in computing.

T3: Parallel Programming with Cilk and Array Notation using the Intel Compiler (2.5 Hours)

Levent Akyil and Farhana Aleen

Intel Corporation, Germany

BRIEF TUTORIAL DESCRIPTION

The current release of the Intel Compiler (ICC v12) supports two new extensions to C/C++ for parallel programming: Cilk and Array Notation, collectively called Cilk Plus. Cilk provides a straightforward way to convert a sequential program into a multithreaded program, thereby exploiting the thread-level parallelism available on multicore machines. Array Notation is an expressive method to vectorize a computational kernel in order to exploit the SIMD parallelism within each core. This tutorial serves as an introduction to Cilk and Array Notation. It also advocates a programming methodology based on cache-oblivious techniques that uses Cilk and Array Notation together to achieve high performance on multi-core systems.

T4: Distributed Data Processing using Hadoop (2.0 Hours)

Rajiv Chittajallu

Yahoo! Research, California, USA

BRIEF TUTORIAL DESCRIPTION

Apache Hadoop is a platform for large scale distributed data processing on commodity hardware. It is designed to reliably distribute petabytes of data and compute tasks across a couple to 4000+ nodes. Hadoop implements a distributed processing framework, MapReduce, and a distributed file system, HDFS. The system is designed to handle node failures in the clusters by replicating data blocks across multiple nodes and rescheduling failed or slow tasks on other nodes.

Hadoop is a top-level Apache project written in Java, with Yahoo! being one of the major contributors. On top of Hadoop, various tools like HBASE, a distributed hash table, and Apache Pig, a platform and high-level language for expressing data analysis programs, have been developed. Along with Hadoop these tools have been widely used by various Internet companies for their data systems. In this tutorial, we will discuss the motivation and design of Hadoop, particular Hadoop features, and how to develop applications using Hadoop and Pig to process terabytes of data. We will review a set of real applications and their implementation with Hadoop. To put all these together, details about setting up a Hadoop cluster for a data processing workflow using Oozie are also discussed.

T5: Porting Applications with GridWay (2.5 Hours)

Ismael Marín

dsa-research.org, Universidad Complutense de Madrid, Spain

BRIEF TUTORIAL DESCRIPTION

The GridWay Metascheduler enables large-scale, reliable and efficient sharing of computing resources. It supports different LRM systems (PBS, SGE, LSF, Condor...) within a single organization or scattered across several administrative domains. GridWay provides a single point of access to all resources in your organization. The aim of the tutorial is to provide a global overview of the process of installing, configuring and using GridWay. The tutorial also includes the development of codes using the C and JAVA bindings of the DRMAA OGF standard. During the demo, participants would receive a practical overview of the agenda topics, having the opportunity to exercise GridWay functionality with examples on a real grid infrastructure.