Commented Coq Development

The Q*cert development team

(Joshua Auerbach, Stefan Fehrenbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jérôme Siméon)

Overview

This is the commented source code for Q*cert, a framework for the development and verification of query compilers built using the Coq proof assistant [Coq].

From a traditional database view point, it mostly covers the so-called logical layer of a query compiler. At the present time, the physical layer consists of a simple code-generation step targeting either an imperative language (JavaScript or Java), or a distributed data-processing platform (Spark or Cloudant).

While a significant subset of the initial compilation pipeline has been mechanically checked for correctness, Q*cert is still a work in progress. This page documents the state of the development, and serves as an entry point to navigate the code. It describes the architecture of the compiler, lists the languages in the compilation pipeline, the existing translation and optimization passes, and links to the corresponding code, including proofs, when available.

Compilation Pipeline

Compilation Paths

This figure represents the general architecture of the compiler with its all supported languages and the different compilation paths.

The colors on the figure indicate the level of Coq development. The part in blue has not been fully mechanized. The part in red has been mechanized in Coq and comes with a proof of correctness. The dotted red arrows indicate correctness proofs that have not yet been fully integrated in a proof of correctness for the compiler as a whole.

Code Organization

The general organization of the source code available in Q*cert is the following:

General-purpose libraries

The coq/Utils directory contains general purpose libraries, tactics and lemmas used throughout the system.

Common components

The coq/Common directory contains components shared across languages.

Brands

The Common/Brands directory contains definitions for a notion of brands (i.e., names with a relation that forms a lattice) which is used to capture inheritance in both the type system and the data model.

Type System

The type system includes support for atomic types, bags, records, and branded types which are used to encode classes with inheritance.

Data Model

The data model includes atomic values, bags, records, and branded values which are used to encode objects with inheritance.

Note that the type system modules (in Common/TypeSystem) and the data modules (in Common/Data) are independent from each other, only relying on the notion of brands. How data relates to type is made explicit in the data typing modules (in Common/DataTyping).

Built-in Operators

A number of operations are assumed to be available in all languages, and defined as built-in operators.

Languages

In the coq directory, each language has its own directory with a subdirectory Lang containing the syntax (Abstract Syntax Tree) and the definition of the semantics of the language.

For most language, the semantics is given through an evaluation function. In a few cases, a denotational semantics is also provided (e.g., for cNNRC and NRA).

When that language comes with a type checker, the type checker can be found in a separate Typing subdirectory.

Source Languages

The following languages can be passed in source (text/file) form to the compiler and are provided with a parser. The parsers for OQL and λ-NRA are written in OCaml and part of the Q*cert distribution. The parsers for TechRule, DesignerRule, SQL and SQL++ are written in Java and require third-party libraries which must be downloaded separately during installation.

Name Language On Command Line Directory Code Reference
TechRule ODM technical rules tech_rule coq/TechRule Source
DesignerRule ODM designer rules designer_rule coq/DesignerRule Source
λ-NRA Lambda NRA lambda_nra coq/LambdaNRA Ast, Eval, Typing [AHMSS2017]
SQL Structured Query Language sql coq/SQL Ast
SQL++ SQL extended for JSON sqlpp coq/SQLPP Ast
OQL Object Query Language oql coq/OQL Ast, Eval [ODMG]

Local Intermediate Languages

The following languages are intermediate representations used for compilation and optimization. They all assume the data is accessed locally (i.e., do not take distribution into consideration). Most of that part of the code comes with type checkers and mechanized proofs of correctness (when translating between those languages or applying optimizations).

Name Language On Command Line Directory Code Reference
CAMPRule CAMP rules camp_rule coq/CAMPRule Syntax, Eval [SSH2015]
CAMP Calculus of Aggregating Matching Patterns camp coq/CAMP Syntax, Eval, Typing [SSH2015]
NRA Nested Relational Algebra nra coq/NRA Syntax, Semantics, Eval, Typing [Moer2014]
NRAe NRA with Environments nraenv coq/NRAEnv Syntax, Eval, Typing
cNRAe Core NRA with Environments nraenv_core coq/cNRAEnv Syntax, Semantics, Eval, Typing [AHMSS2017]
NNRC Named Nested Relational Calculus nnrc coq/NNRC Syntax, Eval, Typing
cNNRC Core NNRC nnrc_core coq/cNNRC Syntax, Semantics, Eval, Typing [BV2007]
NNRS Lightly Imperative NNRC nnrs coq/NNRS Syntax, Semantics, Eval, Typing

Distributed Intermediate Languages

The following languages are intermediate representations used for compilation and optimization. They take distribution into consideration. Most of that part of the code comes with no mechanized proofs (when translating to/from those languages or when applying optimizations).

Name Language On Command Line Directory Code Reference
DNNRC Distributed Named Nested Relational Calculus dnnrc coq/DNNRC Syntax, Eval, Typing
tDNNRC Typed DNNRC dnnrc_typed coq/tDNNRC Syntax, Eval
NNRCMR NNRC extended with Map/Reduce nnrcmr coq/NNRCMR Syntax, Eval
CldMR NNRC extended with Map/Reduce for Cloudant cldmr coq/CldMR Syntax, Eval

Target Languages

The following languages are usually represented as character strings containing code corresponding to the target. The corresponding code needs to be linked to a small runtime for execution. The runtimes are located in the runtimes directory in the Q*cert source code.

Name Language On Command Line Directory Code Runtime Reference
JavaScript JavaScript javascript coq/JavaScript Ast, Target runtimes/javascript
Java Java java coq/Java Target runtimes/java
Cloudant Cloudant Map/Reduce Views cloudant coq/Cloudant Target runtimes/javascript [Cloudant]
Spark RDDs Spark (Resilient Distributed Datasets) spark_rdd coq/SparkRDD Target runtimes/javascript [SparkRDD]
Spark DataFrames Spark (DataFrames) spark_df coq/SparkDF Target runtimes/spark2 [SparkDF]

Translation Passes

The coq/Translation directory contains all the functions translating between languages.

When possible, a corresponding proof of correctness (semantics preservation) and a proof of type preservation for the translation are also included.

From To Code Notes Reference
TechRule CAMPRule Translation In Java
DesignerRule CAMPRule Translation In Java
λ-NRA NRAe Translation, Correctness, Type Preservation
SQL NRAe Translation
SQL++ NRAe Translation
OQL NRAe Translation, Correctness
CAMPRule CAMP Translation, Correctness By macro-expansion [SSH2015]
CAMP NRAe
cNRAe
NRA
Translation, Correctness, Type Preservation
Translation, Correctness, Type Preservation
Translation, Correctness, Type Preservation

[AHMSS2017]
[SSH2015]
NRA cNNRC
cNRAe
Translation, Correctness, Type Preservation
Translation, Correctness
[SSH2015]
[AHMSS2017]
NRAe NNRC
cNRAe
Translation, Correctness
Translation, Correctness

By macro-expansion
cNRAe cNNRC
NRAe
Translation, Correctness, Type Preservation
Translation, Correctness

By inclusion
NNRC cNNRC
NNRS
DNNRC
NNRCMR
Java
Translation, Correctness
Translation, Correctness
Translation, Correctness
Translation
Translation
By macro-expansion


Code-generation
cNNRC NNRC
CAMP
Translation, Correctness
Translation, Correctness, Type Preservation
By inclusion
NNRS Javascript Translation Through Ast
DNNRC tDNNRC Translation Type inference
tDNNRC SparkDF Translation Code-generation
NNRCMR NNRC
CldMR
SparkRDD
Translation
Translation
Translation


Code-generation
CldMR Cloudant Translation Code-generation

Optimization Passes

Optimizers, when present, are located within the Optim directory of the corresponding language. The following four languages support optimization passes:

Name Language Code Reference
NRAe NRA with Environments Optimizer, Correctness [Moer2014], [AHMSS2017]
NNRC Named Nested Relational Calculus Optimizer, Correctness
tDNNRC Typed DNNRC Optimizer
NNRCMR NNRC extended with Map/Reduce Optimizer

Compiler Driver

The coq/Compiler directory contains the compiler driver.

References

[AHMSS2017]"Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler", by Joshua Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar and Jérôme Siméon (ACM SIGMOD'2017 conference).
[BV2007]Polymorphic Type Inference for the Named Nested Relational Calculus by Jan Van den Bussche and Stijn Vansummeren.
[Cloudant]Cloudant Views (Map/Reduce), IBM Bluemix Cloudant API Documentation.
[Coq]The Coq proof assistant, version 8.6.1.
[Moer2014]The most complete treatment for the Nested Relational Algebra (NRA) can be found in Guido Moerkotte's Building Query Compilers book (See Chapter 7: An Algebra for Sets, Bags, and Sequences).
[ODMG]The Object Data Management Standard: ODMG 3.0
[SparkRDD]Spark Programming Guide, Version 2.1.1.
[SparkDF]Spark SQL, DataFrames and Datasets Guide, Version 2.1.1.
[SSH2015]A Pattern Calculus for Rule Languages: Expressiveness, Compilation, and Mechanization by Avraham Shinnar, Jérôme Siméon and Martin Hirzel.