A query compiler written using
the Coq proof assistant
and a platform for implementing and verifying query compilers.
What is Q*cert?
Q*cert implements several intermediate languages useful for compilation and optimization. The semantics of each of them is precisely defined using the Coq proof assistant. Using Coq allows to prove properties on these languages, to verify the correctness of the translation from one to another, and to check that optimizations are correct.
An architecture made of small and well defined components makes Q*cert a promising platform for implementing or verifying new query languages and to develop new optimizations.
The code for the compiler is available as open-source on github.
The compiler architecture and the compilation paths are discribed in the compiler driver and can be represented by the following figure:
The supported languages are:
- 𝝀NRA: NRA with Lambdas
- SQL: Structured Query Language
- OQL: Object Query Language
- TechRule: ODM technical rules
- DesignerRule: ODM designer rules
- CAMPRule: Rule Macros for CAMP
- CAMP: Calculus of Aggregating Matching Patterns
- NRA: Nested Relational Algebra
- NRAe: NRA with Environments
- cNRAe: Core NRAe
- NNRC: Named Nested Relational Calculus
- cNNRC: Core NNRC
- DNNRC: Distributed NNRC
- tDNNRC: Typed DNNRC
- NNRCMR: NNRC + Map/Reduce
- CldMR: NNRC + Cloudant Map/Reduce
The Q*cert development, notably the choices of intermediate representations, builds on earlier work:
- The most complete treatment for the Nested Relational Algebra (NRA) used by Q*cert can be found in Guido Moerkotte's Building Query Compilers book (See Chapter 7: An Algebra for Sets, Bags, and Sequences).
- Our main reference for the Named Nested Relational Calculus (NNRC) is Polymorphic Type Inference for the Named Nested Relational Calculus by Jan Van den Bussche and Stijn Vansummeren.
- The Object Query Language (OQL) supported by Q*cert is based on the ODMG standard and was originally designed by Sophie Cluet.
Additional information on some of the novel aspects of the Q*cert work can be found in research publications:
- Initially, the Q*cert compiler has been developed as part of the project META: Middleware for Events, Transactions, and Analytics (IBM Journal of Research and Development).
- The Calculus for Aggregating Matching Patterns (CAMP) and its translation to NRA and NNRC were first proposed in A Pattern Calculus for Rule Languages: Expressiveness, Compilation, and Mechanization by Avraham Shinnar, Jerome Simeon and Martin Hirzel.
- The type system allowing a mix of nominal and structural subtyping has been presented in A Branding Strategy for Business Types by Avraham Shinnar, Jerome Simeon.
- Details on the use of Q*cert for prototyping a query compiler for business rules aggregations can be found in "Experience Report: Prototyping a Query Compiler Using Coq", by Joshua Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jerome Simeon (to appear at the ACM SIGPLAN ICFP'2017 conference, French version).
- Details on the extension to NRA used in Q*cert and its algebraic optimizer can be found in "Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler", by Joshua Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jerome Simeon (to appear at the ACM SIGMOD'2017 conference).
- A description for a full demonstration of the system can be found in "Q*cert: A Platform for Implementing and Verifying Query Compilers", by Joshua Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jerome Simeon (to appear at the ACM SIGMOD'2017 conference).
Applying formal verification techniques to query languages and compilers is an active research area. The following projects have close connections with our work:
- Gregory Malecha and Ryan Wisnesky have recently developed techniques for semantics optimization of language integrated queries using the Coq proof assistant.
- Veronique Benzaken, Evelyne Contejean, and Stefania Dumbrava have developed a A Coq Formalization of the Relational Data Model as part of the Datacert project.
- James Cheney and Christian Urban have developed a mechanization for a subset of XQuery, the XML query language in Mechanizing the metatheory of mini-XQuery.
- Some earlier work on building a verified relational database was presented in Toward a Verified Relational Database Management System by Gregory Malecha, Greg Morrisett, Avraham Shinnar and Ryan Wisnesky.
- To the best of our knowledge, the first attempt at applying theorem proving technology to query compilers was done as part of the Coko-Kola project, using the Larch theorem prover.
Q*cert initial source code was developed at the IBM T.J. Watson Research Center.