9. Unparsing configuration file format

The unparsing configuration is a JSON file that provides “document templates”, i.e. patterns to generate Prettier documents. Langkit’s unparsing engine uses these templates to turn a given syntax tree to a Prettier document, and then delegates the final transformation of this document to text to Prettier itself.

Knowledge about Prettier documents (its Intermediate Representation) is required in order to write unparsing configuration: please refer to Prettier’s own documentation.

9.1. Top level structure

Unparsing configuration files have the following format:

{
  "node_configs": {
     "Node1": {},
     "Node2": {}
  },
  "max_empty_lines": 1,
  "token_configs": {}
}

node_configs

This entry is mandatory, and provides a mapping from node type names to Node configurations.

Standard node derivation rules apply to configurations: if node B derives from node A, and if node B does not specify a configuration for its field F, then the configuration of field F for node A applies (same for the list separator).

max_empty_lines

Optional. If provided, it must be a natural number that indicates the maximum number of consecutive empty lines to preserve during the source code reformatting. If omitted, all empty lines are preserved.

token_configs

Optional entry that determines how individual tokens are formatted. See Token configurations.

9.2. Node configurations

Node configurations control how a given node type must be formatted. It is encoded as a JSON object which can have the following entries:

node (optional)

If present, it contains a document template to wrap the basic unparsing of the node. (see Document templates).

fields (optional)

If present, must be a mapping from field names to document templates to format the field when it is present in the syntax tree.

sep (optional, only for list nodes)

Document template to unparse the list separator.

leading_sep and trailing_sep (optional, only for list nodes that accept respectively leading and trailing separators)

Document templates to unparse leading/trailing separators.

flush_before_children (optional, only for list nodes)

Boolean (true by default), that controls whether line breaks recovered from the source to reformat are flushed before each list element.

independent_lines (optional, only for list nodes)

Boolean (false by default), that controls whether each list item is formatted on its own line. When true, the formatting of rewritten trees can stop reformating at the boundary of such nodes.

table (optional, onlyfor list nodes)

If present, unparsing such lists yield a list of table documents, each list child being unparsing to a list row. If present, it must contain an object that accepts the following entries:

  • sep_before: Whether list separators must be inserted at the end of the previous row ("sep_before": true) or at the beginning of the next row ("sep_before": false). This is optional, and defaults to true.

  • split: Determine which kind of trivia found between two list children trigger a table split (i.e. the presence of such trivias end the current table, and trigger the creation of a new table for the next children). This field is optional (by default: nothing splits table), and when present, must be an array of strings, with the following possible values: "empty_line", "line_comment".

  • must_break: Whether each row for this table must go on its own line. If false (the default), rows go on each line only when a break occurs in the table.

  • join: Determine whether a list child must be put on the previous table row, i.e. whether to join what would instead be two rows.

    If present, this must be an object with a mandatory “predicate” entry, that must be a reference to a predicate property i.e. a property that each list child has and that returns whether to join rows, as a boolean.

    The optional template entry must be a template that describes how to join two rows: the first row is substituted to the recurse_left template and the second row is substituted to the recurse_right template. For example:

    "join": {
      "predicate": "p_my_predicate",
      "template": [
        "recurse_left",
        {"kind": "tableSeparator", "text": ""},
        {"kind": "group", "document": ["line", "recurse_right"]}
      ]
    }
    

9.3. Token configurations

Token configurations control how individual tokens used for parsing (keywords, punctuation, …) must be formatted. It is encoded as a JSON object which can have the following entries:

default (optional)

When provided, it must be one of the following strings:

  • lower: format tokens as lower case.

  • upper: format tokens as upper case.

  • original: keep the formatting found in the original sources.

If omitted, lower is used. Note that for case-sensitive languages, lower and upper are equivalent: just preserve the casing used in the original sources.

formattings (optional)

A JSON object that maps default token formattings to the ones to use when unparsing with this configuration. These formattings override the formattings implied by the default setting. For associations to null, use the formatting found in the original source.

For example:

"token_configs": {
  "default": "upper",
  "formattings": {
    "|": "!",
    "abstract": "Abstract",
    "overriding": null
  }
}

This configuration instructs the unparser to, by default, turn all keywords into uppercase, but unparse | tokens into ! (valid if the lexer considers that the two are equivalent), unparse abstract as Abstract and preserve the original formatting for overriding keywords.

9.4. Document templates

A document template is essentially a set of instructions to produce the expected Prettier document for a given syntax tree node. Templates are created by the composition of the following building blocks.

recurse

This is the default template to unparse nodes, node fields or list separators. Instantiating it just yields the default unparsing for that node/field/separator:

"recurse"

breakParent

Yield the corresponding Prettier document:

"breakParent"

line, hardline, hardlineWithoutBreakParent, softline, literalline

Yield the corresponding Prettier document:

"line"
"hardline"
"hardlineWithoutBreakParent"
"softline"
"literalline"

flushLineBreaks

Placeholder to emit potential line breaks that come from the source code to reformat:

"flushLineBreaks"

trim

Yield the corresponding Prettier document.

"trim"

whitespace

Yield a text document with the specified amount of spaces:

{"kind": "whitespace", "length": 2}

Or as a shortcut for a length of 1:

"whitespace"

align

Yield an align Prettier document:

{
  "kind": "align",
  "width": 3,
  "contents": "recurse"
}
{
  "kind": "align",
  "width": "foo",
  "contents": "recurse"
}

See also Bubble up (handling of trivias).

continuationLineIndent

Yield an continuationLineIndent Prettier document:

{
  "kind": "continuationLineIndent",
  "contents": "recurse"
}

See also Bubble up (handling of trivias).

dedent

Yield a dedent Prettier document:

{"kind": "dedent", "contents": "recurse"}

See also Bubble up (handling of trivias).

dedentToRoot

Yield a dedentToRoot Prettier document:

{"kind": "dedentToRoot", "contents": "recurse"}

See also Bubble up (handling of trivias).

fill

Yield a fill Prettier document:

{"kind": "fill", "document": "recurse"}

See also Bubble up (handling of trivias).

group

Yield a group Prettier document:

{"kind": "group", "document": "recurse"}
{
  "kind": "group",
  "document": "recurse",
  "shouldBreak": true
}

An optional id field makes it define a symbol to reference in the same template:

{
  "kind": "group",
  "document": "recurse",
  "id": "mySymbol"
}

See also Bubble up (handling of trivias).

if

Yield one of two alternative templates depending on a controlling expression (see Expressions).

{
   "kind": "if",
   "condition": {"kind": "is_empty", "node": "this_node"},
   "then": "recurse",
   "else": ["recurse", "whitespace"]
}

The condition expression must return a boolean. If it evaluates to true, the then template is used. If it evaluates to false or raises an exception, the else template is used instead.

ifBreak

Yield a ifBreak Prettier document:

{"kind": "ifBreak", "breakContents": "recurse"}
{
  "kind": "ifBreak",
  "breakContents": "recurse",
  "flatContents": "recurse"
}
{
  "kind": "ifBreak",
  "breakContents": "recurse",
  "flatContents": "recurse",
  "groupId": "mySymbol"
}

match

Yield one of several alternative templates depending on a controlling node (what the node expression returns).

Each alternative template is guarded by a pattern (see Patterns): the expression yields the alternative template associated to the first pattern that matches the controlling node.

Note that at least one pattern (in practice: the last one) must match all values (i.e. be the default pattern):

{
  "kind": "match",
  "node": "this_node",
  "matchers": [
    {"pattern": null, "document": "recurse"},
    {"pattern": "*", "document": "recurse"}
  ]
}

If evaluating the node expression triggers an error, the patterns are matched against a null node.

indent

Yield a indent Prettier document:

{"kind": "indent", "contents": "recurse"}

See also Bubble up (handling of trivias).

  • The “innerRoot” template yields a “innerRoot” Prettier document:

    {"kind": "innerRoot", "contents": "recurse"}
    

markAsRoot

Yield a markAsRoot Prettier document:

{"kind": "markAsRoot", "contents": "recurse"}

See also Bubble up (handling of trivias).

recurse_field

Valid only in node templates for concrete nodes that are neither abstract, token nor list nodes. When used, the whole template cannot contain any recurse/recurse_flatten template, and the template, once linearized, must reflect how the node is unparsed.

For example, let’s consider that the VarDecl node is created parsing the following chunks:

"var" [f_name] ":" [f_type] ";"

Then its node template must contain two recurse_field templates for the two fields, in the same order, and with the same tokens in between. For instance:

[
  {"kind": "text", "text": "var"},
  {"kind": "recurse_field", "field": "f_name"},
  {
    "kind": "group",
    "document": [
      {"kind": "text", "text": ":"},
      {"kind": "recurse_field", "field": "f_type"}
    ]
  },
  {"kind": "text", "text": ";"},
]

recurse_flatten

Acts like recurse but refines its result so that the document nested in align, fill, group, indent templates and in 1-item document lists is returned instead (recursively).

tableSeparator

Yield a tableSeparator Prettier document:

{"kind": "tableSeparator", "text": "some_text_to_unparse"}

text

Yield a text Prettier document:

{"kind": "text", "text": "some_text_to_unparse"}

Using this template is valid in specific contexts only:

  • For node templates, when used with recurse_field templates: see the documentation for recurse_field;

  • For fields templates: in this case, the linearized template must reflect how the field is unparsed. See the documentation for recurse_field to have more information about linearization.

list

JSON lists yield the corresponding list Prettier documents:

["whitespace", "recurse"]

9.5. Expressions

Some document templates like match or if contain expressions to control what documents to yield exactly: they can be evaluated to booleans, strings, nodes, … but they never yield documents by themselves.

The following expression building blocks are available:

bin_op

Perform a binary operation on two operands. For instance:

{
  "kind": "bin_op",
  "op": "="
  "lhs": {"kind": "node_symbol", "prefix": "this_field"},
  "rhs": {"kind": "symbol_value", "value": "foo"}
}

The two operands (lhs and rhs entries) are sub-expressions, andn the operator (op entry) can be one of the following strings:

  • =, to test equality. Both operands must have compatible types (two arbitrary nodes that do not need to be the same type, or the same type exactly), and the result is a boolean.

  • and/or, the short-circuiting boolean operators.

cast

Take a node and convert it to another type. If the conversion fails, this yields a null node:

{
  "kind": "cast",
  "node": "this_node",
  "type": "NodeType"
}

eval_member

Evaluate the member (field or property) of a given value (struct or node), with potential arguments, and return the field value or property result. Property arguments can be passed as positional arguments (in the args list) or as keyword arguments (in the kwargs object), or both (keyword arguments are associated after the positional arguments):

{
  "kind": "eval_member",
  "member": "some_field",
  "prefix": "this_node"
}
{
  "kind": "eval_member",
  "member": "some_property",
  "prefix": "this_node",
  "args": [
     "this_node"
     {"kind": "is_empty": "node": "this_node"}
  ],
  "kwargs": {
    "arg3": "this_node",
    "arg4": {"kind": "node_text", "node": "this_node"}
  }
}

is_a

Return whether the node operand matches the given pattern (see Patterns):

{
  "kind": "is_a",
  "node": "this_node",
  "pattern": {"kind": "node", "type": "Node2"}
}

is_empty

Return whether its operand is considered as empty for unparsing purposes (i.e. an empty list node with no nearby comment).

{"kind": "is_empty", "node": "this_node"}

node_symbol

Take a token node and returns the corresponding symbol:

{"kind": "node_symbol", "node": "this_node"}

Note that it returns the empty symbol for null nodes.

node_text

Take a node and returns its text:

{"kind": "node_text", "node": "this_node"}

Note that it returns the empty string for null nodes.

not

Take a boolean and returns its opposite:

{
  "kind": "not",
  "operand": {"kind": "is_empty", "node": "this_node"}
}

string

This expression materializes a string literal:

{"kind": "string", "value": "some string value"}

symbol

This expression materializes a symbol literal:

{"kind": "symbol", "value": "some string value"}

this_node

Return the node that instantiates the current template:

"this_node"

this_field (valid only inside fields configurations)

Return the child node of this_node used to instantiate the current template.

"this_field"

In case of an error happening during expression evaluation, the evaluation is reported in the LANGKIT.UNPARSING.EXPANSION_ERRORS trace (GNATCOLL.Traces), and the encompassing condition evaluates to false.

9.6. Patterns

Patterns are used to test values against given “shapes”. A pattern can be one of the following:

*

This pattern matches all possible values:

"*"

null

Matches null nodes.

null

false/true

Match either the false or the true boolean.

false
true

symbol_literal

Match symbol values against a given constant:

{"kind": "symbol_literal", "value": "foo"}

node

Match nodes of a given type (or a derivation of the given type):

{"kind": "node", "type": "SomeNodeType"}

This patterns also allows performing additional tests on its fields/properties:

{
  "kind": "node",
  "type": "SomeNodeType",
  "members": [
    {
      "member": "f_some_field",
      "pattern": {"kind": "node", "type": "SomeOtherType"}
    },
    {
      "member": "p_some_property",
      "args": ["this_node"],
      "kwargs": {"arg2": "this_field"},
      "pattern": false
    }
  ]
}

In the previous example, the pattern will match only nodes n that satisfy all of the following conditions:

  • The node is of the SomeNodeType (or any of its descendant).

  • The f_some_field member is a node that is of the SomeOtherType (likewise).

  • The n.p_some_property(this_node, arg2=this_field) member evaluates without errors and its result is false.

not

Match anything that a subpattern does not match. The following pattern matches all symbol values except the foo symbol.

{"kind": "not", "pattern": {"kind": "symbol_literal", "value": "foo"}}

or

Matches if any of the sub-patterns match. The following pattern matches either the foo symbol or the bar one.

{
  "kind": "or",
  "patterns": [
    {"kind": "symbol_literal", "value": "foo"},
    {"kind": "symbol_literal", "value": "bar"}
  ]
}

9.7. Bubble up (handling of trivias)

The unparsing engine inserts trivias to preserve (comments and line breaks) in the same context as the token that preceeds them, with one exception: trivias that preceed or come after list nodes are inserted in that list node.

After template expansion, the Prettier document produced for a given syntax tree is post-processed: trivias that appear before the first token of a node (leading trivias) or after its last token (trailing trivias) may be moved up in the document tree (“bubbled up”).

By default, all leading trivias are bubbled up, and trailing trivias are bubbled up only for fill and group documents. In order to satisfy some coding styles, these defaults can be overriden on a case by case basis in templates. To achieve this, the following templates accept the optional bubbleUpLeadingTrivias and bubbleUpTrailingTrivias boolean entries to override the default behavior for trivias bubbling up.

  • align

  • continuationLineIndent

  • dedent

  • dedentToRoot

  • fill

  • group

  • indent

  • innerRoot

  • markAsRoot

For instance:

{
  "kind": "group",
  "contents": "recurse",
  "bubbleUpLeadingTrivias": false
}
{
  "kind": "fill",
  "contents": "recurse",
  "bubbleUpTrailingTrivias": true
}
{
  "kind": "dedent",
  "contents": "recurse",
  "bubbleUpLeadingTrivias": false,
  "bubbleUpTrailingTrivias": true
}