Q*cert

A query compiler written using the Coq proof assistant
and a platform for implementing and verifying query compilers.

What is Q*cert?

Q*cert is a query compiler: it takes some input query and generates code for execution. It can compile several source query languages, such as (subsets of) SQL and OQL. It can produce code for several target backends, such as Java, JavaScript, Cloudant and Spark.

Q*cert implements several intermediate languages useful for compilation and optimization. The semantics of each of them is precisely defined using the Coq proof assistant. Using Coq allows to prove properties on these languages, to verify the correctness of the translation from one to another, and to check that optimizations are correct.

An architecture made of small and well defined components makes Q*cert a promising platform for implementing or verifying new query languages and to develop new optimizations.

Q*cert Specification

The code for the compiler is available as open-source on github.

The compiler architecture and the compilation paths are discribed in the compiler driver and can be represented by the following figure:

The supported languages are:

TechRule: ODM technical rules
DesignerRule: ODM designer rules
CAMPRule: Rule Macros for CAMP
𝝀-NRA: Lambda NRA
SQL: Structured Query Language
SQL++: Structured Query Language extended for JSON
OQL: Object Query Language
CAMP: Calculus of Aggregating Matching Patterns
NRA: Nested Relational Algebra
NRA^e: NRA with Environments
cNRA^e: Core NRA^e
NNRC: Named Nested Relational Calculus
cNNRC: Core NNRC
NNRS: Lightly Imperative NNRC
DNNRC: Distributed NNRC
tDNNRC: Typed DNNRC
NNRCMR: NNRC extended with Map/Reduce
CldMR: NNRC extended with Map/Reduce for Cloudant

Publications

Additional information on some of the novel aspects of the Q*cert work can be found in research publications:

The Q*cert compiler was originally developed as part of the project META: Middleware for Events, Transactions, and Analytics (IBM Journal of Research and Development).
The Calculus for Aggregating Matching Patterns (CAMP) and its translation to NRA and NNRC were first presented in: A Pattern Calculus for Rule Languages: Expressiveness, Compilation, and Mechanization, Avraham Shinnar, Jerome Simeon and Martin Hirzel, ECOOP'2015. (slides).
A description for the Q*cert type system, which handles classes and inheritance, was first presented in: A Branding Strategy for Business Types, Avraham Shinnar, Jerome Simeon, in "A List of Successes That Can Change the World - Essays Dedicated to Philip Wadler on the Occasion of His 60th Birthday", pp 367-387.
Details on our experience prototyping a query compiler for business rules aggregations can be found in: Prototyping a Query Compiler Using Coq (Experience Report), Joshua Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jerome Simeon, ACM SIGPLAN ICFP'2017 conference. (slides).
A French version of that same article is also available in: Prototyper un compilateur de requêtes avec Coq, Joshua Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jerome Simeon, JFLA'2017. (slides).
Details on the extension to the Nested Relational Algebra used in Q*cert and on its algebraic optimizer can be found in: Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler, Joshua Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jerome Simeon, ACM SIGMOD'2017 conference. (slides).
A description for a full demonstration of the system can be found in: Q*cert: A Platform for Implementing and Verifying Query Compilers, by Joshua Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jerome Simeon, ACM SIGMOD'2017 conference. (poster).

Related Projects

Applying formal verification techniques to query languages and compilers is an active research area. The following projects have close connections with our work:

Gregory Malecha and Ryan Wisnesky have recently developed techniques for semantics optimization of language integrated queries using the Coq proof assistant.
Veronique Benzaken, Evelyne Contejean, and Stefania Dumbrava have developed a A Coq Formalization of the Relational Data Model as part of the Datacert project.
James Cheney and Christian Urban have developed a mechanization for a subset of XQuery, the XML query language in Mechanizing the metatheory of mini-XQuery.
Some earlier work on building a verified relational database was presented in Toward a Verified Relational Database Management System by Gregory Malecha, Greg Morrisett, Avraham Shinnar and Ryan Wisnesky.
To the best of our knowledge, the first attempt at applying theorem proving technology to query compilers was done as part of the Coko-Kola project, using the Larch theorem prover.

Credits

Q*cert initial source code was developed at the IBM T.J. Watson Research Center by Josh Auerbach, Stefan Fehrenbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jerome Simeon.