3. Core concepts
Regardless of which language you use in order to write your programs, the Libadalang API comes with several core concepts. Understanding these upfront will make your programming experience easier, so the goal of this section is to make you familiar with them.
3.1. Analysis contexts and units
In order to use Libadalang, the first thing you need to do is to create an analysis context. These are holders that embed all data and computations done during source code analysis. The second step is to create what we call an analysis unit: a holder for computations with respect to a single source file.
Analysis units really map to source files: Libadalang will process one source file that contains several compilation units as one analysis unit; similarly a subunit (“separate”) gets a dedicated unit.
In order to create an analysis unit, you just need to have a context at hand
and provide either a source file or a buffer that contains the source text to
be analyzed. This will decode the source text (from UTF-8
, ISO-8859-1
,
etc.), run the lexer on it to get a stream of tokens, and then run the parser
to get the parsing tree. Note that creating an analysis unit always succeeds:
if there are parsing errors or if there is an error opening the file, a
analysis unit is returned nevertheless, with error messages stored in a
diagnostics vector.
While you can create analysis units explicitly using the analysis context API,
semantic analysis also create them on demand. For instance, trying to resolve
Ada.Containers.Hash_Type
will automatically load an analysis unit for the
Ada.Containers
runtime unit in order to fetch and return the node
corresponding to the Hash_Type
type declaration.
3.2. Tokens and trivias
While traditional lexers only give you access to regular tokens and discard information such as white spaces and comments, Libadalang can preserve all formatting information. This option, enabled by default when creating analysis contexts, makes Libadalang suitable for a lot of usages beyond regular compiler-style analysis.
In order to achieve this, Libadalang keeps track of two distinct sequences of “lexemes”: regular tokens and trivias. The sequence of regular tokens (identifiers, keywords, etc.) is the one that the parser will use to create its parsing tree, while the sequence of trivias is just ignored by parsers, but still preserved for later analysis.
Although Libadalang exposes both sequences as a single one (ranges of tokens for the analysis units or for single nodes), getting the index of a regular token will return the position of this token in the list of regular tokens, and conversely for trivias. You can see this in the following example:
procedure Foo; -- Comment
This source will yield 3 tokens and 3 trivias:
the
procedure
keyword (token index 1),the first white space trivia (trivia index 1),
the
Foo
identifier (token index 2),the
;
token (token index 3),the second white space trivia (trivia index 2),
the comment trivia (trivia index 3).
Depending on the context, “token” designates either only regular tokens or both regular tokens and trivias.
3.3. Syntax and semantic tree
After a successful parsing step, an analysis unit holds the resulting parsing tree. Each node in this tree has several characteristics:
a specific kind (identifier, object declaration, binary expression, …),
a first token and a last one (we derive source location and source text excerpts from these).
Nodes can be grouped by kind in three categories:
- Regular nodes
These have a definite number of children. For instance, binary operations always have three children: the left operand, the operator and the right operand. Some children can be null, for example the default expression in a parameter specification node.
- List nodes
These can have any number of children (including zero).
- Token nodes
They encompass exactly one token and have no children. For instance: string literals.
Sometimes, a node is created to mean the absence of some keywords. For example
the Mode_Default
node is created to describe the absence of explicit mode
for parameter specification. We call these “ghost” nodes.
Children for all nodes can be accessed by index, but only the children of
regular nodes get named accessors. For instance, parameter specification
nodes have a mode
field as their third children; this field is exposed as
the F_Mode
function in the Ada API and the .f_mode
attribute in the
Python API.
The syntax tree also serves for semantic analysis. Look for functions whose
names match P_*
in the Ada API or .p_*
node members in the Python API:
these are what we call node properties (hence the P prefix). They compute
semantic information on top of the syntax tree. The most useful example is the
P_Xref
property, which computes cross-reference information. On legal Ada
code bases, properties are always supposed to return a correct result. However,
on incorrect code, they can either return approximate results, or raise
Property_Error
exceptions (PropertyError
in Python).
In interactive tools such as IDEs, the source code can evolve so that it is
necessary to recompute information. In Libadalang, the analysis unit creation
primitives can reload an analysis unit if the corresponding source file was
already analyzed before. As a result, the previous syntax tree is released and
replaced with the new one: any attempt to use references to released nodes will
raise a Stale_Reference_Error
exception (StaleReferenceError
in
Python).
3.4. Unit providers
Some Ada constructs such as with
clauses require name resolution in
Libadalang to “jump” from one source file to another. As an example, processing
the with Ada.Text_IO;
statement requires to load the source files that
contain the Ada
package specification and the Ada.Text_IO
one.
Depending on projects, knowing which source file to use in order to find a
compilation unit can be arbitrarily complex. This is why Libadalang comes with
the concept of unit provider: given a unit name (Ada.Strings.Unbounded
)
and a unit kind (specification/body), a unit provider must either return a
source file to read, or raise an error if the unit name is not valid.
Note that since Ada is a case-insensitive language, unit names Foo.Bar
and
foo.BAR
are equivalent. Package specifications, subprogram declarations and
their generic counterparts are considered as specification units while package
and subprogram bodies and subunits (“separates”) are processed as body units.
The analysis context constructor takes an unit provider: name resolution will
use this provider whenever it needs to resolve a unit reference to a source
file. In the absence of an explicit unit provider, the default one will look
for source files in the current directory following the GNAT naming convention:
for instance parent-child.ads
for the specification of the Parent.Child
unit.
Libadalang allows you to create your own unit provider, should the naming convention of your project be totally custom, but it also comes with two very useful providers:
one that takes a GPR project file and gives access to all Ada sources referenced by the corresponding project tree (the “project unit provider”),
one that looks for all source files that match a given file name pattern in a given list of directory (the “auto unit provider”).
You can find documentation for the unit providers that Libadalang implements in
the Ada API reference and for Python in
libadalang.UnitProvider
. Tutorials for both languages show how this
works in practice: see the Ada tutorial
and the Python tutorial.
3.4.1. Aggregate projects
Due to the very nature of Libadalang’s operations regarding name resolution, it expects exactly one source file for each unit name/kind. This assumption allows the “get to the definition” operation to return exactly one result. The aggregate project formalism is lax and allows concurrent declarations and implementations to coexist in a single project tree, which violates the assumption described in the previous paragraph.
For instance, consider the following aggregate project:
-- "agg.gpr"
aggregate project Agg is
for Project_Files use ("main32.gpr", "main64.gpr");
end Agg;
The following sources (two alternative implementations of the same package):
-- "arch32/arch.ads"
package Arch is
type Target_Address is mod 2 ** 32;
end Arch;
-- "arch64/arch.ads"
package Arch is
type Target_Address is mod 2 ** 64;
end Arch;
And finally the following alternative main projects:
-- "main32.gpr"
project Main32 is
for Source_Dirs use (".", "arch32");
for Main use ("main.adb");
for Object_Dir use "obj/main32";
end Main32;
-- "main64.gpr"
project Main64 is
for Source_Dirs use (".", "arch64");
for Main use ("main.adb");
for Object_Dir use "obj/main64";
end Main64;
-- "main.adb"
with Ada.Text_IO, Arch;
procedure Main is
begin
Ada.Text_IO.Put_Line
("Arch.Target_Address'Size =" & Arch.Target_Address'Size'Image);
end;
Building the aggregate project with gprbuild -Pagg -p -j8 -q
yields two
alternative main
executables:
$ obj/main32/main
Arch.Target_Address'Size = 32
$ obj/main64/main
Arch.Target_Address'Size = 64
Just like the output of the “main” program depends on which Arch
package is
used, the result of the “get to the definition” query on
Arch.Target_Address
above depends on it.
Because of this, Libadalang’s project provider has multiple ways to work, each one coming with its own restrictions on the aggregate projects passed to it:
When only passed an aggregate project, that project must contain at most one source file per unit name/kind. It is not the case in the example above, because there are two files (
arch32/arch.ads
andarch64/arch.ads
) associated to the spec of theArch
unit.When passed an aggregate project plus a reference to a specific project file in the whole project tree, the unit provider will consider only the source files that this project file and all its dependencies own contain. In the example above, this means that one could give the
agg.gpr
project file plus a reference to themain32.gpr
project: Libadalang would then only analyze thearch32/arch.ads
source file.Libadalang can also create several project providers, each one having a restricted view on the sub-project it has access to, so that only one source file is available for each unit name/kind.
While the two first options are available in all Libadalang APIs, option 3. is currently available only in its Ada API.
See the Dealing with aggregate projects section for related code examples.