Robert Grimm
New York University

Towards an Extensible Compiler - Step 1: The Parser

Operating and distributed systems can significantly benefit from
language and compiler support, even if they are written in C or C++
and not in a fully type-safe or functional programming language.
Examples include libasync for building event-driven distributed
services, Capriccio for building scalable multi-threaded servers, and
MACEDON for building overlay networks.  Unfortunately, existing
solutions for extending C-like languages are either not expressive
enough or targeted at compiler experts, thus preventing other system
builders from reaping similar benefits.  To address this problem, we
are exploring a C dialect and the corresponding compiler, which make
the base language, C, easily extensible by system builders.
Our first step is a parser generator that supports easily modifiable
grammars and avoids the complexities associated with extending
bottom-up (LR) grammars or top-down (LL) grammars with lookahead.  Our
work builds on recent research on packrat parsers, which are top-down
parsers that backtrack but also memoize each intermediate result
(hence their name), thus ensuring linear-time performance.  Our work
makes this parsing technique, which has been developed in the context
of functional languages, practical for C-like languages (currently,
Java).  Furthermore, our parser generator supports simpler grammar
specifications and more convenient error reporting, while also
producing better performing parsers through aggressive optimizations.
In this talk, I motivate our overall research, describe our parser
generator and its optimizations in detail, and reflect on how to
structure extensible source-to-source transformation systems.