An Efficient Compiler for Recursive Functions on Mostly-Serialized Data
In a typical data processing program, the representation of data in memory is distinct from its representation in a serialized form on disk. The former has pointers and arbitrary, sparse layout, facilitating easy manipulation by a program while the latter is packed contiguously, facilitating easy I/O. Recent work has looked at how to write programs that can operate directly on the packed representation, unifying in-memory and on-the-wire formats and granting performance improvements due to improved locality and reduced pointer-chasing. Two fundamental drawbacks must be contended with: (i) functions that operate on pointerless, packed data are difficult to write and maintain; and (ii) the lack of pointer-based indirect access means that while some operations are faster, others can be vastly more expensive in the packed representation. Our work addresses these two problems. First, we provide a compiler that allows programmers to write recursive functions as though they were operating on pointer-based structures, and then translates those functions to operate over packed data; unlike prior work, we use a robust, type-driven translation scheme that does not rely on brittle analyses. We introduce a new location calculus that serves as our intermediate representation and extends a region calculus to track physical locations of values within a region. Second, we determine when to introduce indirection into the packed data structure to support efficient processing for functions that can otherwise be asymptotically suboptimal; unlike traditional compilers, we introduce indirections only when needed, maintaining the efficiency of pointerless access everywhere else. We show that our approach yields significant performance improvement over prior approaches to operating on packed data, without requiring abandoning idiomatic programming with recursive functions or risking blowing up asymptotic complexity.
Sat 29 SepDisplayed time zone: Guadalajara, Mexico City, Monterrey change
13:30 - 15:10
|An Efficient Compiler for Recursive Functions on Mostly-Serialized Data|
|Comparing strategies for lightweight threading based on continuations|