Optimizing Data Parallelism with Linear Programming in Nessie
The task of programming GPUs presents problems of coordinating memory accesses to hide memory latency and navigating the heterogeneous and unfamiliar architectural details involved in utilizing their high memory bandwidth and vast capacity for arithmetic operations. One approach to escaping the trap of low-level GPU programming is to defer decisions about memory layout and data transfer to a compiler that can rewrite more costly control and data structures into ones better suited for GPU execution. The Nessie compiler does this, lowering programs from NESL–a strict, nested data-parallel functional programming language–to efficient low-level C++ and NVIDIA CUDA kernels operating on flat arrays. The compiler achieves efficiency in generated code through optimizations on a specialized intermediate representation which constitute the primary contribution of this work. The intermediate representation represents arrays as the application of per-element generator functions to index spaces, separates serial and parallel portions of input programs, and makes explicit a set of building blocks for parallel array operations. The critical optimization pass in the Nessie compiler is fusion of parallel array operations, but more important than the rewrite itself is the selection of which operations to fuse and which to not. Choices between sets of operations to fuse are evaluated by constructing an integer linear programming instance using a heuristic model of performance benefit associated with each optimization opportunity, constrained by the compatibility relation between different array operations. This approach allows Nessie to avoid exhaustively searching the space of potential optimized programs, which is combinatorially large due to incompatibilities between fusion choices. Nessie performs competitively on benchmarks against both hand-written CUDA code and other systems for nested data-parallel programming on the GPU.
Sat 29 SepDisplayed time zone: Guadalajara, Mexico City, Monterrey change
15:30 - 17:00 | |||
15:30 45mTalk | Optimizing Data Parallelism with Linear Programming in Nessie FHPC | ||
16:15 45mDay closing | Optional Discussion FHPC |