Commented Coq Development
The Q*cert development team
(Joshua Auerbach, Stefan Fehrenbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jérôme Siméon)
Overview
This is the commented source code for Q*cert, a framework for the development and verification of query compilers built using the Coq proof assistant [Coq].
From a traditional database view point, it mostly covers the so-called logical layer of a query compiler. At the present time, the physical layer consists of a simple code-generation step targeting either an imperative language (JavaScript or Java), or a distributed data-processing platform (Spark or Cloudant).
While a significant subset of the initial compilation pipeline has been mechanically checked for correctness, Q*cert is still a work in progress. This page documents the state of the development, and serves as an entry point to navigate the code. It describes the architecture of the compiler, lists the languages in the compilation pipeline, the existing translation and optimization passes, and links to the corresponding code, including proofs, when available.
Compilation Pipeline
This figure represents the general architecture of the compiler with its all supported languages and the different compilation paths.
The colors on the figure indicate the level of Coq development. The part in blue has not been fully mechanized. The part in red has been mechanized in Coq and comes with a proof of correctness. The dotted red arrows indicate correctness proofs that have not yet been fully integrated in a proof of correctness for the compiler as a whole.
Code Organization
The general organization of the source code available in Q*cert is the following:
bin
: directory where binaries are installedcoq
: the Coq source code of the compilerdoc
: documentation and Web demojavaRunners
: simple runners for the compiled queriesjavaService
: code to integrate the Java parsers (for SQL, SQL++, and ODM rules)jrulesParser
: ODM rules parserocaml
: OCaml code of the command line compilerruntimes
: runtime libraries for the different backendssamples
: examples of queries in various source languagesscripts
: utility scriptssqlParser
: SQL parsersqlppParser
: SQL++ parser
General-purpose libraries
The coq/Utils
directory contains general purpose
libraries, tactics and lemmas used throughout the system.
- CoqLibAdd: Additional definitions and lemmas complementing those found in the Coq Library.
- ListAdd: Additional definitions and lemmas on lists.
- StringAdd: Additional definitions and lemmas on strings, including a total order relation.
- SortingAdd: simple insertion sort
- Lattice: definition of a lattice (loosely based on ideas from A reflection-based proof tactic for lattices in Coq and http://www.pps.univ-paris-diderot.fr/~sozeau/repos/coq/order).
- Lift and LiftIterators: definitions and properties of lifting operations over option types.
- Assoc: definitions and properties of association lists.
- Sublist: definitions and properties of sub-lists.
- Compat: notion of compatibility between two association lists.
- Bindings and BindingsNat: support for association lists for which the keys are ordered and with no duplicates.
- Digits and Fresh: support for creating and reasoning about fresh names (represented as strings).
- Bag: support for bags (or multisets).
- Var: Variables.
- JSON: A simple JSON AST.
- OptimizerStep and OptimizerLogger: Support for log traces, used to monitor optimizations.
Common components
The coq/Common
directory contains components shared
across languages.
Brands
The Common/Brands
directory contains definitions for a
notion of
Type System
The type system includes support for atomic types, bags, records,
and
Common/TypeSystem
: definition and properties of the type system
Data Model
The data model includes atomic values, bags, records,
and
Common/Data
: definition and properties of the data modelCommon/DataTyping
: typing for the data model
Note that the type system modules
(in Common/TypeSystem
) and the data modules
(in Common/Data
) are independent from each other, only
relying on the notion of brands. How data relates to type is made
explicit in the data typing modules
(in Common/DataTyping
).
Built-in Operators
A number of operations are assumed to be available in all languages, and defined as built-in operators.
Common/Operators
: syntax and semantics for built-in operatorsCommon/OperatorsTyping
: typing for the built-in operators
Languages
In the coq
directory, each language has its own
directory with a subdirectory Lang
containing the
syntax (Abstract Syntax Tree) and the definition of the semantics of
the language.
For most language, the semantics is given through an evaluation function. In a few cases, a denotational semantics is also provided (e.g., for cNNRC and NRA).
When that language comes with a type checker, the type checker can
be found in a separate Typing
subdirectory.
Source Languages
The following languages can be passed in source (text/file) form to the compiler and are provided with a parser. The parsers for OQL and λ-NRA are written in OCaml and part of the Q*cert distribution. The parsers for TechRule, DesignerRule, SQL and SQL++ are written in Java and require third-party libraries which must be downloaded separately during installation.
Name | Language | On Command Line | Directory | Code | Reference | |
---|---|---|---|---|---|---|
TechRule | ODM technical rules | tech_rule |
coq/TechRule |
Source | ||
DesignerRule | ODM designer rules | designer_rule |
coq/DesignerRule |
Source | ||
λ-NRA | Lambda NRA | lambda_nra |
coq/LambdaNRA |
Ast, Eval, Typing | [AHMSS2017] | |
SQL | Structured Query Language | sql |
coq/SQL |
Ast | ||
SQL++ | SQL extended for JSON | sqlpp |
coq/SQLPP |
Ast | ||
OQL | Object Query Language | oql |
coq/OQL |
Ast, Eval | [ODMG] |
Local Intermediate Languages
The following languages are intermediate representations used for compilation and optimization. They all assume the data is accessed locally (i.e., do not take distribution into consideration). Most of that part of the code comes with type checkers and mechanized proofs of correctness (when translating between those languages or applying optimizations).
Name | Language | On Command Line | Directory | Code | Reference |
---|---|---|---|---|---|
CAMPRule | CAMP rules | camp_rule |
coq/CAMPRule |
Syntax, Eval | [SSH2015] |
CAMP | Calculus of Aggregating Matching Patterns | camp |
coq/CAMP |
Syntax, Eval, Typing | [SSH2015] |
NRA | Nested Relational Algebra | nra |
coq/NRA |
Syntax, Semantics, Eval, Typing | [Moer2014] |
NRAe | NRA with Environments | nraenv |
coq/NRAEnv |
Syntax, Eval, Typing | |
cNRAe | Core NRA with Environments | nraenv_core |
coq/cNRAEnv |
Syntax, Semantics, Eval, Typing | [AHMSS2017] |
NNRC | Named Nested Relational Calculus | nnrc |
coq/NNRC |
Syntax, Eval, Typing | |
cNNRC | Core NNRC | nnrc_core |
coq/cNNRC |
Syntax, Semantics, Eval, Typing | [BV2007] |
NNRS | Lightly Imperative NNRC | nnrs |
coq/NNRS |
Syntax, Semantics, Eval, Typing |
Distributed Intermediate Languages
The following languages are intermediate representations used for compilation and optimization. They take distribution into consideration. Most of that part of the code comes with no mechanized proofs (when translating to/from those languages or when applying optimizations).
Name | Language | On Command Line | Directory | Code | Reference |
---|---|---|---|---|---|
DNNRC | Distributed Named Nested Relational Calculus | dnnrc |
coq/DNNRC |
Syntax, Eval, Typing | |
tDNNRC | Typed DNNRC | dnnrc_typed |
coq/tDNNRC |
Syntax, Eval | |
NNRCMR | NNRC extended with Map/Reduce | nnrcmr |
coq/NNRCMR |
Syntax, Eval | |
CldMR | NNRC extended with Map/Reduce for Cloudant | cldmr |
coq/CldMR |
Syntax, Eval |
Target Languages
The following languages are usually represented as character
strings containing code corresponding to the target. The
corresponding code needs to be linked to a small runtime for
execution. The runtimes are located in the runtimes
directory in the Q*cert source code.
Name | Language | On Command Line | Directory | Code | Runtime | Reference |
---|---|---|---|---|---|---|
JavaScript | JavaScript | javascript |
coq/JavaScript |
Ast, Target | runtimes/javascript |
|
Java | Java | java |
coq/Java |
Target | runtimes/java |
|
Cloudant | Cloudant Map/Reduce Views | cloudant |
coq/Cloudant |
Target | runtimes/javascript |
[Cloudant] |
Spark RDDs | Spark (Resilient Distributed Datasets) | spark_rdd |
coq/SparkRDD |
Target | runtimes/javascript |
[SparkRDD] |
Spark DataFrames | Spark (DataFrames) | spark_df |
coq/SparkDF |
Target | runtimes/spark2 |
[SparkDF] |
Translation Passes
The coq/Translation
directory contains all the
functions translating between languages.
When possible, a corresponding proof of correctness (semantics preservation) and a proof of type preservation for the translation are also included.
Optimization Passes
Optimizers, when present, are located within the
Optim
directory of the corresponding language. The
following four languages support optimization passes:
Name | Language | Code | Reference |
---|---|---|---|
NRAe | NRA with Environments | Optimizer, Correctness | [Moer2014], [AHMSS2017] |
NNRC | Named Nested Relational Calculus | Optimizer, Correctness | |
tDNNRC | Typed DNNRC | Optimizer | |
NNRCMR | NNRC extended with Map/Reduce | Optimizer |
Compiler Driver
The coq/Compiler
directory contains the compiler
driver.
References
[AHMSS2017] | "Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler", by Joshua Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jérôme Siméon (ACM SIGMOD'2017 conference). |
---|---|
[BV2007] | Polymorphic Type Inference for the Named Nested Relational Calculus by Jan Van den Bussche and Stijn Vansummeren. |
[Cloudant] | Cloudant Views (Map/Reduce), IBM Bluemix Cloudant API Documentation. |
[Coq] | The Coq proof assistant, version 8.6.1. |
[Moer2014] | The most complete treatment for the Nested Relational Algebra (NRA) can be found in Guido Moerkotte's Building Query Compilers book (See Chapter 7: An Algebra for Sets, Bags, and Sequences). |
[ODMG] | The Object Data Management Standard: ODMG 3.0 |
[SparkRDD] | Spark Programming Guide, Version 2.1.1. |
[SparkDF] | Spark SQL, DataFrames and Datasets Guide, Version 2.1.1. |
[SSH2015] | A Pattern Calculus for Rule Languages: Expressiveness, Compilation, and Mechanization by Avraham Shinnar, Jérôme Siméon and Martin Hirzel. |