Robert Grimm New York University Towards an Extensible Compiler - Step 1: The Parser --------------------------------------------------- Operating and distributed systems can significantly benefit from language and compiler support, even if they are written in C or C++ and not in a fully type-safe or functional programming language. Examples include libasync for building event-driven distributed services, Capriccio for building scalable multi-threaded servers, and MACEDON for building overlay networks. Unfortunately, existing solutions for extending C-like languages are either not expressive enough or targeted at compiler experts, thus preventing other system builders from reaping similar benefits. To address this problem, we are exploring a C dialect and the corresponding compiler, which make the base language, C, easily extensible by system builders. Our first step is a parser generator that supports easily modifiable grammars and avoids the complexities associated with extending bottom-up (LR) grammars or top-down (LL) grammars with lookahead. Our work builds on recent research on packrat parsers, which are top-down parsers that backtrack but also memoize each intermediate result (hence their name), thus ensuring linear-time performance. Our work makes this parsing technique, which has been developed in the context of functional languages, practical for C-like languages (currently, Java). Furthermore, our parser generator supports simpler grammar specifications and more convenient error reporting, while also producing better performing parsers through aggressive optimizations. In this talk, I motivate our overall research, describe our parser generator and its optimizations in detail, and reflect on how to structure extensible source-to-source transformation systems.