Q*cert
A query compiler written using
the Coq proof assistant
and a platform for implementing and verifying query compilers.
What is Q*cert?
Q*cert is a query compiler: it takes some input query and generates code for execution. It can compile several source query languages, such as (subsets of) SQL and OQL. It can produce code for several target backends, such as Java, JavaScript, Cloudant and Spark.
Q*cert implements several intermediate languages useful for compilation and optimization. The semantics of each of them is precisely defined using the Coq proof assistant. Using Coq allows to prove properties on these languages, to verify the correctness of the translation from one to another, and to check that optimizations are correct.
An architecture made of small and well defined components makes Q*cert a promising platform for implementing or verifying new query languages and to develop new optimizations.
Q*cert Specification
The code for the compiler is available as open-source on github.
The compiler architecture and the compilation paths are discribed in the compiler driver and can be represented by the following figure:
The supported languages are:
- TechRule: ODM technical rules
- DesignerRule: ODM designer rules
- CAMPRule: Rule Macros for CAMP
- đ-NRA: Lambda NRA
- SQL: Structured Query Language
- SQL++: Structured Query Language extended for JSON
- OQL: Object Query Language
- CAMP: Calculus of Aggregating Matching Patterns
- NRA: Nested Relational Algebra
- NRAe: NRA with Environments
- cNRAe: Core NRAe
- NNRC: Named Nested Relational Calculus
- cNNRC: Core NNRC
- NNRS: Lightly Imperative NNRC
- DNNRC: Distributed NNRC
- tDNNRC: Typed DNNRC
- NNRCMR: NNRC extended with Map/Reduce
- CldMR: NNRC extended with Map/Reduce for Cloudant
Publications
Additional information on some of the novel aspects of the Q*cert work can be found in research publications:
- The Q*cert compiler was originally developed as part of the project META: Middleware for Events, Transactions, and Analytics (IBM Journal of Research and Development).
-
The Calculus for Aggregating Matching Patterns (CAMP) and its
translation to NRA and NNRC were first presented
in:
A Pattern Calculus for Rule Languages: Expressiveness, Compilation, and Mechanization , Avraham Shinnar, Jerome Simeon and Martin Hirzel, ECOOP'2015. (slides). -
A description for the Q*cert type system, which handles classes
and inheritance, was first presented
in:
A Branding Strategy for Business Types , Avraham Shinnar, Jerome Simeon, in "A List of Successes That Can Change the World - Essays Dedicated to Philip Wadler on the Occasion of His 60th Birthday", pp 367-387. -
Details on our experience prototyping a query compiler for
business rules aggregations can be found in:
Prototyping a Query Compiler Using Coq (Experience Report) , Joshua Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jerome Simeon, ACM SIGPLAN ICFP'2017 conference. (slides). -
A French version of that same article is also available
in:
Prototyper un compilateur de requĂȘtes avec Coq , Joshua Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jerome Simeon, JFLA'2017. (slides). -
Details on the extension to the Nested Relational Algebra used
in Q*cert and on its algebraic optimizer can be found in:
Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler , Joshua Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jerome Simeon, ACM SIGMOD'2017 conference. (slides). -
A description for a full demonstration of the system can be
found
in:
Q*cert: A Platform for Implementing and Verifying Query Compilers , by Joshua Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jerome Simeon, ACM SIGMOD'2017 conference. (poster).
Related Projects
Applying formal verification techniques to query languages and compilers is an active research area. The following projects have close connections with our work:
- Gregory Malecha and Ryan Wisnesky have recently developed techniques for semantics optimization of language integrated queries using the Coq proof assistant.
- Veronique Benzaken, Evelyne Contejean, and Stefania Dumbrava have developed a A Coq Formalization of the Relational Data Model as part of the Datacert project.
- James Cheney and Christian Urban have developed a mechanization for a subset of XQuery, the XML query language in Mechanizing the metatheory of mini-XQuery.
- Some earlier work on building a verified relational database was presented in Toward a Verified Relational Database Management System by Gregory Malecha, Greg Morrisett, Avraham Shinnar and Ryan Wisnesky.
- To the best of our knowledge, the first attempt at applying theorem proving technology to query compilers was done as part of the Coko-Kola project, using the Larch theorem prover.
Credits
Q*cert initial source code was developed at the IBM T.J. Watson Research Center by Josh Auerbach, Stefan Fehrenbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jerome Simeon.