Private.Raw_grammar
Representation of S-expression grammars
This module defines the representation of S-expression grammars produced by @@deriving sexp_grammar
. It introduces an AST to represent these grammars and a notion of "group" to represent the grammars of a mutually recursive set of OCaml type declaration.
The grammar for a given type expression can be constructed via:
[%sexp_grammar: <type>]
Functionality goals: With post-processing, sexp grammars can be pretty-printed in a human-readable format and provides enough information to implement completion and validation tools.
Performance goals: @@deriving sexp_grammar
adds minimal overhead and introduces no toplevel side effect. The compiler can lift the vast majority of ASTs generated by @@deriving sexp_grammar
as global constants. Common sub-grammars are usually shared, particularly when they derive from multiple applications of the same functor.
Non-goals: Stability, although we will make changes backwards-compatible or at least provide a reasonable upgrade path.
In what follows, we describe how this is achieved.
A group
contains the grammars for all types of a mutually recursive group of OCaml type declarations.
To ensure maximum sharing, a group is split into two parts:
generic_group
depends only on the textual type declarations. Where the type declaration refers to an existing concrete type, the generic group takes a variable to represent the grammar of that type. This means that the compiler can lift each type declaration in the source code to a shared global constant.group
binds the type variables of the generic_group
, either to concrete grammars where the type declaration refers to a concrete type, or to another variable where the type declaration itself was polymorphic.To understand this point better, imagine the following type declaration
type t = X of u
were explicitly split into its generic_group
and group
parts:
type 'u t_generic = X of 'u
type t = u t_generic
If u
came from a functor argument, it's easy to see that t_generic
would be exactly the same in all applications of the functor and only t
would vary. The grammar of t_generic
, which is the biggest part, would be shared between all applications of the functor.
The Raw_grammar.t
type optimizes for performance over ease of use. To help users process the raw grammars into a more usable form, we keep two identifiers in the generated grammars:
generic_group_id
uniquely identifies a generic_group
. It is a hash of the generic group itself. (It is okay that this scheme would conflate identical type declarations, because the resulting generic groups would be identical as well.)group_id
uniquely identifies a group
. It is a unique integer, generated lazily so that we don't create a side effect at module creation time.The exact processing would depend on the final application. We expect that a typical consumer of sexp grammars would define less-indirected equivalents of the t
and group
types, possibly re-using the _ type_
and Atom.t
types.
Variable names. These are used to improve readability of the printed grammars. Internally, we use numerical indices to represent variables; see Implicit_var
below.
module Atom : sig ... end
A grammatical type which classifies atoms.
type 't type_ =
| Any | (* Any list or atom. *) |
| Apply of 't type_ * 't type_ list | (* Assign types to (explicit) type variables. *) |
| Atom of Atom.t | (* An atom, in particular one of the given |
| Explicit_bind of var_name list * 't type_ | (* In |
| Explicit_var of int | (* Indices for type variables, e.g. Unlike de Bruijn indices, these are always bound by the nearest ancestral |
| Grammar of 't | (* Embeds other types in a grammar. *) |
| Implicit_var of int | (* Indices for type constructors, e.g. |
| List of 't sequence_type | (* A list of a certain form. Depending on the |
| Option of 't type_ | (* An optional value. Either syntax recognized by |
| Record of 't record_type | (* A list of lists, representing a record of the given |
| Recursive of type_name | (* A type in the same mutually recursive group, possibly the current one. *) |
| Union of 't type_ list | (* Any sexp matching any of the given types. One useful special case is |
| Variant of 't variant_type | (* A sexp which matches the given |
A grammatical type which classifies sexps. Corresponds to a non-terminal in a context-free grammar.
and 't sequence_type = 't component list
A grammatical type which classifies sequences of sexps. Here, a "sequence" may mean either a list on its own or, say, the sexps following a constructor in a list matching a variant_type
.
Certain operations may greatly favor simple sequence types. For example, matching List [ Many type_ ]
is easy for any type type_
(assuming type_
itself is easy), but List [ Many type1; Many type2 ]
may require backtracking. Grammars derived from OCaml types will only have "nice" sequence types.
and 't component =
| One of 't type_ | (* Exactly one sexp of the given type. *) |
| Optional of 't type_ | (* One sexp of the given type, or nothing at all. *) |
| Many of 't type_ | (* Any number of sexps, each of the given type. *) |
| Fields of 't record_type | (* A succession of lists, collectively defining a record of the given |
Part of a sequence of sexps.
and 't variant_type = {
ignore_capitalization : bool; | (* If true, the grammar is insensitive to the case of the first letter of the label. This matches the behavior of derived |
alts : (label * 't sequence_type) list; | (* An association list of labels (constructors) to sequence types. A matching sexp is a list whose head is the label as an atom and whose tail matches the given sequence type. As a special case, an alternative whose sequence is empty matches an atom rather than a list (i.e., As a workaround, to match |
}
A tagged union of grammatical types. Grammars derived from OCaml variants will have variant types.
A collection of field definitions specifying a record type. Consists only of an association list from labels to fields.
and 't field = {
optional : bool; | (* If true, the field is optional. *) |
args : 't sequence_type; | (* A sequence type which the arguments to the field must match. An empty sequence is permissible but would not be generated for any OCaml type. *) |
}
A field in a record.
and group = {
gid : group_id; | |
generic_group : generic_group; | |
origin : string; | (*
For a globally unique identifier, use See |
apply_implicit : t list; |
}
and generic_group = {
implicit_vars : var_name list; |
ggid : generic_group_id; |
types : (type_name * t type_) list; |
}