Modular Acceleration: Tricky Cases of Functional High-Performance Computing
This case study examines the data-parallel functional implementation of three algorithms: generation of quasi-random Sobol numbers, breadth-first search, and calibration of Heston market parameters via a least-squares procedure. We show that while all these problems permit elegant functional implementations, good performance depends on subtle issues that must be confronted in both the implementations of the algorithms themselves, as well as the compiler that is responsible for ultimately generating high-performance code. In particular, we demonstrate the utility of strength-reducing ``chunking'' combinators that permit efficient sequentialisation of excess parallelism, study the efficient implementation of an irregular algorithm without sacrificing parallelism, and argue for the utility of nested regular data parallelism.
Sat 29 Sep
|10:20 - 11:20|
Troels HenriksenUniversity of Copenhagen, Denmark, Martin ElsmanUniversity of Copenhagen, Denmark, Cosmin OanceaUniversity of Copenhagen, DenmarkDOI
|11:20 - 12:05|