Optimizing Data Parallelism with Linear Programming in Nessie (FHPC 2018)

Write a Blog >>

Sun 23 - Sat 29 September 2018 St. Louis, Missouri, United States

Who

Joseph Wingerter, John Reppy

Track

FHPC 2018

Time Zone

The program is currently displayed in (GMT-05:00) Guadalajara, Mexico City, Monterrey.

Use conference time zone: (GMT-05:00) Guadalajara, Mexico City, MonterreySelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sat 29 Sep 2018 15:30 - 16:15 at New York Central - Afternoon Session 2 Chair(s): Mike Rainey

Abstract

The task of programming GPUs presents problems of coordinating memory accesses to hide memory latency and navigating the heterogeneous and unfamiliar architectural details involved in utilizing their high memory bandwidth and vast capacity for arithmetic operations. One approach to escaping the trap of low-level GPU programming is to defer decisions about memory layout and data transfer to a compiler that can rewrite more costly control and data structures into ones better suited for GPU execution. The Nessie compiler does this, lowering programs from NESL–a strict, nested data-parallel functional programming language–to efficient low-level C++ and NVIDIA CUDA kernels operating on flat arrays. The compiler achieves efficiency in generated code through optimizations on a specialized intermediate representation which constitute the primary contribution of this work. The intermediate representation represents arrays as the application of per-element generator functions to index spaces, separates serial and parallel portions of input programs, and makes explicit a set of building blocks for parallel array operations. The critical optimization pass in the Nessie compiler is fusion of parallel array operations, but more important than the rewrite itself is the selection of which operations to fuse and which to not. Choices between sets of operations to fuse are evaluated by constructing an integer linear programming instance using a heuristic model of performance benefit associated with each optimization opportunity, constrained by the compatibility relation between different array operations. This approach allows Nessie to avoid exhaustively searching the space of potential optimized programs, which is combinatorially large due to incompatibilities between fusion choices. Nessie performs competitively on benchmarks against both hand-written CUDA code and other systems for nested data-parallel programming on the GPU.

Joseph WingerterAuthor

John ReppyAuthor

University of Chicago