8. Appendix

8.1. Command Line Invocation

Analysis may be run from IDEs, or from the command line. CodePeer is invoked on the command line as follows:

Usage: codepeer -P <project-file> [-Xvar=val] [--no-subprojects]
  [-file <filename>] [-level 0|1|2|3|4|min|max] [-jnum]
  [-baseline] [-cutoff <run_id>] [-set-baseline-id <run_id>]
  [-messages [min|normal|max]] [-quiet] [-U] [-U main_file]
  [-output-only] [-output-dir <dir>] [-no-project-partitions]
  [-no-race-conditions] [-no-preconditions] [-no-presumptions]
  [-no-html-output] [-f] [-compiler-mode] [-32bits] [-gnatxxx]
  [-method-memory-size <megs>] [-method-timeout <seconds>]
  [-dbg-vn-limit <num_vns>] [--gnat-warnings[=xxx]] [--gnatcheck]
  [-output-lib <lib>] [-verbose] [-v] [--help|--version]
  [-output-msg[-only] [codepeer_msg_reader switches]]

* -P <project-file>
  Specify the project file name. All files from the specified project tree
  (projects and subprojects) will be analyzed.

* -Xvar=val
  Specify an external reference for the project file.

* --no-subprojects
  Only analyze files from the root project, ignoring all subprojects. Also
  ignore main units defined in the root project.

* -file <filename>
  Specify the path (absolute, or relative to the current directory) of a
  single file name to analyze, ignoring other files from the project.

* -level 0|1|2|3|4|min|max
  Specify the level of analysis performed: 0 and 1 for very fast and
  approximate analysis (per subprogram), 2 for slightly more accurate/slower
  (per small set of units), 3 for more accurate and much slower, 4 for
  global analysis with no automatic partitioning (may exceed memory capacity
  and take a very long time). Default is level 3. min is equivalent to 0,
  max to 4.

* -32bits
  Assume a 32bits target matching the host (e.g. 32bits linux for a 64bits
  linux configuration).

* -baseline
  Specify that this run is the new default baseline.

* -compiler-mode
  Run CodePeer in compiler mode, analyzing files one by one incrementally,
  with no historical database.

* -cutoff <run_id>
  Specify that this run should temporarily ignore the baseline and use
  run_id to compare messages.

* -f
  Force analysis of all files, ignoring previous run. This will also force
  the generation of all SCIL files.

* -jnum
  Specify the number of simultaneous processes ("num") launched for
  generating SCIL and to perform an analysis. The special value 0 means use
  the number of cores on the machine. Ignored if -level max is specified.
  Default is -j0 for SCIL generation and -j1 for analysis. Note that
  -jobs num is equivalent to -jnum.

* -gnatxxx
  GNAT compiler switches that will be used during SCIL generation, see the
  CodePeer and GNAT User's Guide for detail. Some of the relevant switches
  include: -gnat95, -gnatI, -gnateF, gnatwxxx, -gnateD.

* --gnat-warnings[=xxx]
  Tells CodePeer to run the GNAT front end and report its warnings
  in addition to its own analysis. The user can specify a list of
  warning options ("xxx") for the GNAT front end
  (e.g. "--gnat-warnings=a.u" for "-gnatwa.u"). By default, use -gnatw
  switches specified as compiler switches in the project file, or the
  GNAT default.

* --gnatcheck
  Tells CodePeer to run GNATcheck and report its messages in
  addition to CodePeer own analysis.

* -messages [min | normal | max]
  Specify level of verbosity for generated messages. Default is normal.
  -messages min will filter most messages that are likely false alarms,
  while -messages max will display all possible errors.

* -quiet
  Run quietly, with no output displayed on standard output.

* -set-baseline-id <run_id>
  Specify that run_id is the new default baseline.

* -U
  Analyze all sources of the specified project file.

* -U main_file
  Analyze the closure of units rooted at the unit contained in main_file.
  Note: this option only works when running codepeer from a clean
  environment (no existing codepeer directory).

* -help/--help
  Display help file and exit.

* -v/--version
  Display version information.

* -output-msg[-only] [codepeer_msg_reader switches]
  If specified, codepeer will run codepeer_msg_reader to output messages
  (in addition to, or instead of (-only) running an analysis) in a
  compiler-like format. You can control this output by adding
  codepeer_msg_reader switches (e.g. "-output-msg -csv -out report.csv",
  see "codepeer_msg_reader --help").

8.2. Advanced Command Line Options

  • --subdirs=dir

    Specify subdirectory to use for the project’s object and library directories. Default is to use –subdirs=codepeer.

  • -additional-patterns <file>

    When present, this file will be merged with MessagePatterns.xml

  • -daemon "<package.subp>"

    Specify that the named subprogram is the entry point of a daemon task (corresponding to an Ada task).

    Subprograms are identified by prefixing the name of the subprogram with the name of the package in which the subprogram is defined. Wildcard matching of the string providing the module and subprogram name (via the asterisk character) is available at the beginning or end of the specification, for example: “package.subp”.

    See Race Condition Messages for more information.

  • -dbg-vn-limit <num_vns>

    Specify maximum number of “value numbers” allowed during analysis of a single method/subprogram. See section Skipped Subprograms or Files for more information.

  • -message-patterns <file>

    Specify an alternate MessagePatterns.xml file.

  • -method-memory-size <megs>

    Specify the maximum memory size in MB to be used for analyzing a single method/subprogram. By default, CodePeer allocates a maximum of 1/3 of the amount of physical memory on the system to analyze an individual subprogram, to limit memory explosions in some corner cases. For very complex subprograms, this limit may not be sufficient and may be bumped via this switch. See section Skipped Subprograms or Files for more information.

  • -method-timeout <seconds>

    Specify the maximum time spent, in seconds, to analyze a single method/subprogram, to avoid spending too much time analyzing very complex subprograms. The default value is set to 600 (10 minutes). See section Skipped Subprograms or Files for more information.

  • -no-background

    Generate a progress bar on standard output, using control characters. The default is generate a builder-like output, suitable both for humans and for scripts.

  • -no-error-history

    This option suppresses generation of the Message History report; historical data on messages produced is still maintained in the database.

  • -no-html-output

    Suppress generation of HTML output. This will speed up processing.

  • -no-preconditions

    Do not propagate checks as implicit preconditions. This generally means that code within a subprogram that would be identified as a check will not be promoted as a precondition of the subprogram, typically resulting in messages flagged on lines corresponding to the checks rather than flagged on calls. As an exception to implicit precondition suppression, checks that inputs to subprograms (parameters and referenced globals) are initialized will still be propagated as preconditions, because flagging all uses of inputs would result in too many false-positive messages. In addition, any user-defined preconditions will remain as preconditions and be taken into account on calls.

  • -no-presumptions

    Do not generate any presumptions for unanalyzed subprogram calls. Use of this option will generally result in additional messages being reported in the vicinity of unanalyzed calls, since assumptions about the result of such calls are no longer made. This option is implied by -level 4.

  • -no-project-partitions

    Do not create partitions based on the project hierarchy. See CodePeer Limitations and Heuristics for more details.

  • -no-race-conditions

    Suppress analysis and reporting of race conditions. This may result in faster reviews. Alternatively, specifying -race-conditions will force enabling of race conditions analysis, which is disabled by default at levels 0, 1 and 2 (see -level switch).

  • -no-update-scil

    Do not regenerate SCIL files automatically on the project file.

  • -output-dir <dir>

    Specify the output directory where CodePeer will store its results. This switch overrides the Output_Directory project attribute. If dir is a relative path, it is relative to the current directory where CodePeer is run.

  • -output-lib <lib>

    Specify the intermediate output library file name generated when using a gpr file. By default CodePeer generates a library file called <project>.library, which can be overriden with this switch.

  • -output-only

    Do not reanalyze the code, perform other operations such as HTML generation by default. Useful if you want to use the -cutoff to temporarily ignore the baseline and use a different run to compare messages, or if you want to change the default baseline via the additional -set-baseline-id switch, usually combined with -no-html-output.

  • -reentrant "<package.subp>"

    Specify that the named subprogram is a reentrant entry point (corresponding to an Ada task type).

    Subprograms are identified by prefixing the name of the subprogram with the name of the package in which the subprogram is defined. Wildcard matching of the string providing the module and subprogram name (via the asterisk character) is available at the beginning or end of the specification, for example: “package.subp”.

    See Race Condition Messages for more information.

  • -scil-dir <scil-dir>

    Specify the main SCIL directory to be used by CodePeer. By default, CodePeer uses SCIL if this switch is not specified, or the first SCIL directory specified in the library file.

  • -status-codes

    By default, a successful run will terminate with a status of zero (success). Likewise by default, a run will terminate with nonzero status only if an internal error has occurred. However, CodePeer has some additional status codes that may be useful to a user (see Return Codes). This option causes these codes to be generated rather than zero to indicate the more detailed result. In particular, when at least one message is generated, a non-zero status will result if this option is specified.

  • -verbose

    Display calls to other executables (e.g. codepeer-gprbuild, codepeer_be).

  • -version

    Display version information and exit. Output is more succinct than -v.

  • codepeer_msg_reader switches

    When -output-msg or -output-msg-only is specified, it is possible to add codepeer_msg_reader switches to control the way the messages are printed. See section Text Output for a description of the different codeper_msg_reader switches.

8.3. Return Codes

When CodePeer finishes, it returns a return Code according to the following table. Note: return codes 25 and 29 are not generated unless the -status-codes command line option is given.

Return Code Description
0 Run completed. No errors found.
25 Run completed. Some messages were emitted. Note: this return code is generated only if the -status-codes command line option has been given.
27 Input Error. Run halted before finishing. Likely cause: a corrupt .library, MessagePatterns.xml, or .scil file.
29 CodePeer Internal Error. This is most likely due to exhaustion of host memory. Run continued with possible omissions. Review the standard output or log file to determine which files were skipped. Note: this return code is generated only if the -status-codes command line option has been given.
100 Run Interrupted. Likely cause: the operating system may have terminated the process, as with a “control-C interrupt” or “end process” directive. option has been given.
110 Database Error. Likely cause: Database corrupted. Please contact Technical Support.
123 or 124 CodePeer Internal Error. Run halted before finishing. Likely cause: insufficient memory. Please contact Technical Support.

8.4. CodePeer_Bridge Tool

A command line tool called codepeer_bridge is provided as part of the CodePeer installation. This tool is primarily used by GPS and GNATbench to interact with the CodePeer database, and can also be used by advanced users directly.

You can run codepeer_bridge --help to get an online help. The following switches are supported:

  • -h, --help

    Display help usage

  • --output-dir=DIR

    Specify CodePeer output directory. This switch is mandatory and corresponds to the directory <object dir>/codepeer/<project.output>.

  • --db-dir=DIR

    Specify CodePeer database directory. By default, the database specified in the output directory will be used, so this switch is optional.

  • -o, --output-file=FILE

    Specify XML output file containing the main information generated by codepeer_bridge.

  • --status-file=FILE

    Specify XML status file generated by default if no –export/–import-reviews switches are specified.

  • --export-annotations

    Export CodePeer annotations (pre/post-conditions, etc...) in addition to messages.

  • --export-reviews=FILE

    Export all manual reviews stored in the database in the specified XML file.

  • --import-reviews=FILE

    Import all manual reviews from the given XML (typically produced via –export-reviews) or CSV (typically produced via -output-msg -csv) file. The file extension must be .xml or .csv.

  • -v, --verbose

    Verbose output

See Copying Results Locally and Exporting/Importing User Reviews for two examples of manual use of codepeer_bridge.

8.5. GPS Preferences

The CodePeer plug-in for GPS adds the following preferences:

  • Import CodePeer annotations: Controls whether to import and display CodePeer annotations such as generated contracts and values of variables in the source editor. Note that importing CodePeer annotations can take a significant amount of time for large projects.

8.6. Format of MessagePatterns.xml File

MessagePatterns.xml is a file to control the output of CodePeer. Internally, CodePeer generates a number of messages, each with a message text and severity set to Low (see the section about severities below). With MessagePatterns.xml, one can change the severity of a CodePeer message or even suppress the message. You can either modify the file that is part of the CodePeer installation, or you can specify a project-specific file, see Project Attributes for that.

8.6.1. Structure of the File

The file MessagePatterns.xml is simply a list of rules. The structure is the following:

<?xml version="1.0"?>
   <!--  the list of rules here -->

Each rule consists of two parts: a pattern and a target severity. The pattern describes the type of message that this rule matches. The target severity is the new severity of the message as it will appear in the CodePeer output. In the XML format, a rule looks like this:

<Message_Rule Matching_Probability="SEVERITY">
     <!-- definition of the pattern -->

The target severity is marked SEVERITY in this rule, and we will describe below how a pattern can be defined.

The following severity levels can be used as target severities, in increasing order of severity:

  • Info

    Informational messages about e.g. subprogram not being analyzed.

  • Low

    Low ranking messages.

  • Medium

    Medium ranking messages.

  • High

    High ranking messages.

  • Suppressed

    Used to suppress a message from the reports.

8.6.2. Matching of rules

For each message, CodePeer will try to match each pattern of each rule. If no pattern matches, the severity of the message remains Low. If a single pattern matches, the target severity of the corresponding rule becomes the new severity of the message. If several rules match, the highest severity wins. Note in particular that Suppressed being the highest severity, if a matching rule has target severity Suppressed, the message is suppressed regardless of other matching rules.

8.6.3. Defining patterns

A pattern is a list of specifiers. There are three types of specifiers, Boolean specifiers, String specifiers, and computed specifiers. They are written as follows:

<Bool_Specifier Name_Of_Boolean_Attribute="NAME"></Bool_Specifier>


<Computed_Specifier Name_of_Computed_Attribute="NAME"></Computed_Specifier>

For each type of specifier the NAME can be chosen among a predefined set. In addition, each specifier can be negated using the attribute Complement="YES". Boolean specifiers

For Boolean specifiers, the following attributes are allowed:

Attribute Name Description
Is_Big_Int The message concerns an integer value.
Is_Big_Rat The message concerns a real value.
Is_Side_Effect The message is related to a side effect generated after calling a subprogram.
Is_Pointer The message concerns a pointer.
Bad_Set_Includes_Null_Literal The message concerns a pointer value which should not be null.
Expected_Set_Is_Subset_Plus_Minus_1000 The message concerns a value that is in -1000 .. 1000.
Expected_Set_Is_Singleton_Set The message concerns a value that can have only one concrete value.
Valid_Bad_Set_Is_Singleton_Set The message concerns a value which can take any concrete values except one.
Valid_Bad_Set_Overlaps_Plus_Minus_1000 The message concerns a value that should not be in -1000 .. 1000.
Precondition_Set_Contains_Holes The message is about a precondition set which is not a contiguous range.
Precondition_Set_Has_Singleton_Hole_At_Zero The message is about a precondition set which includes the values around zero, but not zero itself.
Check_Is_Soft The message is soft (see Description of Annotations)
Check_Is_Loop_Bound_Check This message concerns a loop bound check
Bad_Set_Only_Invalid The message is about a possibly uninitialized memory location.
Message_Depends_On_Unknown_Call Message depends on unknown call.
Exception_Handler_Present The enclosing basic block contains an exception handler. This is relevant in the context of validity checks when CodePeer is checking whether scalar out parameters have been initialized.
Uninteresting_RHS Uninteresting RHS in the context of dead store message.
Message_About_Implicit_Call Message is about implicit (internal) call introduced by the compiler.
Message_Is_Uncertain The message reflects some uncertainty. Alias for: Message_Depends_On_Unknown_Call or Exception_Handler_Present or Uninteresting_RHS or Message_About_Implicit_Call
Message_Depends_On_Object_Merging Message depends on approximating more objects with one abstract object. This is the case when, e.g., array others or wildcard dereferences are involved.
Message_Involves_Loop_Imprecision The message depends on a variable modified in a loop for which CodePeer was not able to compute a precise range.
Message_Likely_False_Positive Approximations were involved when computing information relevant for the message meaning that the message is more likely false positive. Is implied by: Message_Is_Uncertain, Message_Depends_On_Object_Merging, and Message_Involves_Loop_Imprecision
Is_GNAT_Warning The message is a warning from GNAT. When set, disable warnings on unknown Check_Kind String specifier.
Is_GNATcheck_Message The message is from GNATcheck. When set, disable warnings on unknown Check_Kind String specifier. String specifiers

For String specifiers, the chosen attribute name will influence the meaning of the regular expression that is given with the second attribute. The following table gives an overview over the accepted attribute names.

Attribute Name Description
Text Regexp applies to text of the message
Subp Regexp applies to subprogram name
File Regexp applies to file name
Directory Regexp applies to directory name
Likelihood Regexp applies to likelihood of the failure, allowed values are: CHECK_WILL_FAIL, CHECK_MIGHT_FAIL, CHECK_CANNOT_FAIL
Check_Kind Regexp applies to the kind of check that is described by the message. See below for a list of possible checks. Note that you can use a regular expression to match several kinds of checks at once. The regexp is not case sensitive for the attribute Check_Kind.
Original_Check_Kind This is only valid for Precondition_Annotations and Precondition_Checks. Regexp applies to the kind of checks that caused the corresponding precondition.
The regexp above are those used in file names by the Unix shell or DOS prompt:
  • * matches any string of 0 or more characters
  • ? matches any character
  • [char char ...] matches any character listed
  • [char - char] matches any character in given range
  • {elmt, elmt, ...} is an alternation (any of elmt)

Check kind ids can either match messages from the backend, or messages from an external tool (like GNAT warnings). The list of backend check kind ids in MessagePatterns.xml corresponds mostly to CodePeer messages as described in Description of Messages:

Check Kind Description
NON_ANALYZED_CALL_WARNING call not analyzed (due to e.g. tool limitation or code too complex)
UNREAD_IN_OUT_PARAMETER_WARNING suspicious in out: could have out mode
UNASSIGNED_IN_OUT_PARAMETER_WARNING suspicious in out: could have in mode
UNKNOWN_CALL_WARNING unknown call (due to e.g. partitioning)
DEAD_STORE_WARNING unused assignment
DEAD_OUTPARAM_STORE_WARNING unused out parameter
POTENTIALLY_DEAD_STORE_WARNING unused assignment to global
SAME_VALUE_DEAD_STORE_WARNING useless self assignment
INFINITE_LOOP_WARNING loop does not complete normally
PLAIN_DEAD_EDGE_WARNING test predetermined
TRUE_DEAD_EDGE_WARNING test always true
FALSE_DEAD_EDGE_WARNING test always false
TRUE_CONDITION_DEAD_EDGE_WARNING condition predetermined to true
FALSE_CONDITION_DEAD_EDGE_WARNING condition predetermined to false
DEAD_BLOCK_CONTINUATION_WARNING continuation of dead code
ANALYZED_MODULE_WARNING module included in analysis
NON_ANALYZED_MODULE_WARNING module not included, due to CodePeer limitations
NON_ANALYZED_PROCEDURE_WARNING subprogram not included, due to CodePeer limitations
PROCEDURE_DOES_NOT_RETURN_ERROR subprogram never returns
CHECK_FAILS_ON_EVERY_CALL_ERROR subprogram always fails
MISMATCHED_LOCKED_UPDATE_ERROR mismatched protected access
DIVIDE_BY_ZERO_CHECK divide by zero
ARRAY_INDEXING_CHECK array index check
USER_ASSIGN_STM_CHECK overflow check
PRE_ASSIGN_STM_CHECK overflow check (before a call)
POST_ASSIGN_STM_CHECK overflow check (after a call)
TAG_CHECK tag check
TYPE_VARIANT_CHECK discriminant check
ALIASING_CHECK parameter aliasing
RAISE_CHECK raise exception
INVALID_CHECK validity check
MODULE_ANNOTATION package/module annotation
PROCEDURE_ANNOTATION subprogram annotation
INPUT_ANNOTATION subprogram input annotation
OUTPUT_ANNOTATION subprogram output annotation
NEW_OBJ_ANNOTATION new obj (pointer allocation) annotation
PRECONDITION_ANNOTATION precondition annotation
PRESUMPTION_ANNOTATION presumption annotation
POSTCONDITION_ANNOTATION postcondition annotation
LOCALLY_UNUSED_STORE_ANNOTATION potentially unused assignment annotation
UNKNOWN_CALL_ANNOTATION unanalyzed call annotation

The check kind id of a message issued by an external checker directly comes from the output of the corresponding tool.

  • For a GNAT warning the check kind id is the switch associated with the message, as reported in parenthesis by codepeer. For example, the following message has -gnatwv as check kind id:

    main.adb:8:4: medium warning: unassigned variable (-gnatwv): variable "X" is never read and never assigned

    We use the check kind id -gnatwn for default GNAT warnings. While this is not strictly correct with respect to the semantic of the -gnatwn switch, this is a good enough approximation for codepeer usage.

  • For a GNATcheck message, the check kind id is written in the message issued by CodePeer just before (GNATcheck). For example, the following message has Anonymous_Arrays as check kind id:

    main.adb:9:26: low warning: Anonymous_Arrays (GNATcheck): anonymous array type

Note that CodePeer issues a warning when a check kind id (or regular expression) does not match any backend check. Adding the Boolean specifier Is_GNAT_Warning or Is_GNATcheck_Message disables these warnings. Computed specifiers

For computed specifiers, the allowed set of attribute names is the following:

Attribute Name Description
Is_Check The message concerns a check. (see Description of Messages)
Is_Assign_Stm_Check The message concerns a check for an assignment.
Is_Assign_Stm_Or_Precond_Check The message concerns a check for an assignment or a precondition.
Source_Is_Ada The source for that message is Ada (always true currently).
Message_Is_New The message is new wrt. the baseline run.
Message_Is_Dropped The message is in the baseline run, but not this run.
Message_Is_Unchanged The message is unchanged wrt. the baseline run.
Max_Messages Option -messages was given with argument max.
Normal_Messages Option -messages was given with argument normal.
Min_Messages Option -messages was given with argument min.

8.6.4. Example

The following simple example shows a rule which will suppress all messages from the validity check category:

  <Message_Rule Matching_Probability="SUPPRESSED">

The following more complex example shows a rule which will flag as HIGH all check messages which are not soft, will always fail, and which are not user raise statements:

 <Message_Rule Matching_Probability="HIGH">
     <Computed_Specifier Name_Of_Computed_Attribute="Is_Check">





8.7. How does CodePeer Work?

At its heart, CodePeer’s technology is most like a whole-program “optimizer”: it uses extended versions of algorithms familiar to those writing compiler optimizers. First, a conventional compiler front end is used to convert the program into a treelike intermediate language, called “SCIL” (Statically Checkable Intermediate Language). Then, the CodePeer “engine” takes over and begins with a global aliasing analysis followed by translation of each subprogram into a static single assignment (SSA) representation and then a global value numbering over the subprogram. At this point, most compiler optimizers would begin the process of generating code. CodePeer instead begins the process of determining whether the program might violate any run-time checks or user assertions and whether the algorithms have any “race conditions” or security “holes.”

CodePeer analyzes each subprogram separately, by taking into account only the information computed for callees of the subprogram. In particular, the analysis of a subprogram does not depend on its callers. To perform this analysis, CodePeer, using the global value numbering, determines the possible value set for every variable and expression appearing in the program at every point where it appears. It uses this information to determine where checks or assertions might fail and also uses it to determine the “preconditions” and “postconditions” of every subprogram. The preconditions represent what must be true about the inputs to the subprogram for the subprogram’s algorithm to operate without overflowing or failing a run-time check. The postconditions represent what is true about the outputs after the subprogram completes, presuming that the inputs satisfied the preconditions.

Once the pre- and postconditions for a subprogram have been determined, they can be propagated to every point where the subprogram is called, iterating as necessary in the presence of recursion, until the whole program has been analyzed. Once this process stabilizes, a final pass is performed to report any potential run-time or assertion failures, race conditions, or security holes that exist anywhere within the program. During that pass, CodePeer may generate error messages whenever a caller could violate the precondition generated for a callee to guard against an error in the callee. As a result, an error may not be reported by CodePeer at the point where there is a fault in the program, but rather at the point where this fault results in an inconsistency. Reviewing the preconditions at this point allows identifying whether the caller or the callee is the culprit by matching the preconditions generated by CodePeer and the expectations of the programmer. Reviewing the preconditions generated by CodePeer is another way of detecting errors before they occur in the program.

The CodePeer technology is fundamentally a “bottom up” approach, which provides excellent scalability. The complexity of the analysis is not dependent on the number of paths in the call graph but rather merely on the number of subprograms in the graph, with a factor based on the amount of recursion. The technology is also very precise, in that the pre- and postconditions for every subprogram include information about all inputs and outputs, including all global variables and objects reachable through chains of pointers. This precision helps to reduce the false-positive rate while at the same time avoiding false negatives.

The book Static Analysis of Software: The Abstract Interpretation, published by Wiley (2012), contains a chapter on the mathematics underlying the inner working of CodePeer.

8.8. What’s the Difference between CodePeer and SPARK?

Static analyzers fall into two broad categories: bug finders and verifiers. Bug finders detect violations of properties. Verifiers guarantee the absence of violations of properties. Because they target opposite goals, bug finders and verifiers usually have different architectures, are based on different technologies, and require different methodologies.

Typically, bug finders require little upfront work, but may generate many false alarms which need to be manually triaged and addressed, while verifiers require some upfront work, but generate fewer false alarms.

CodePeer belongs to the bug finder category, while SPARK belongs to the verifier category.

In other words, if you want to assess the quality of your code and detect potential runtime or security errors for manual review in Ada code then CodePeer is the right tool.

If you want instead to guarantee absence of runtime or security errors automatically and demonstrate useful properties of your application then SPARK is the right tool.

8.9. Directories and Files Produced by CodePeer

The first phase (SCIL generation) creates subdirectories called codepeer under each object directory of your project hierarchy. Under each codepeer directory, .ali files and a SCIL subdirectory will be created to store .scil files and handle their regeneration.

The second phase (CodePeer analysis) will generate low-level xml files under the codepeer directory, as well as two subdirectories <project>.db and <project>.output. These directories can be changed via project file attributes, see Project Attributes for more details. Under the <project>.output directory, CodePeer will store output in various formats (text listings, html, bts for backtrace, and tooltip information used in GPS), as well as a low-level log file Inspection.log.

All these files and directories are temporary files, except for the <project>.db directory, which needs to be kept in order to benefit from the history of results across runs. But note that when managing results in a configuration management system, there are additional files that may need to be retained (see Saving CodePeer Results in Configuration Management).

8.10. CodePeer Messages and Compiler Warnings

A question asked by some users is: Why not just use the compiler? The Ada compiler generates warnings, so what added value does CodePeer bring?.

It is true that Ada compilers (and in particular GNAT) generate warnings, and these warnings are useful, but they are definitely not comparable to what CodePeer can achieve for several fundamental reasons:

  • Most messages generated by CodePeer are not considered at all by Ada compilers beyond really obvious cases (e.g. all potential cases of runtime errors are not flagged by Ada compilers, detection of race conditions, etc...)
  • Compilers in general will do no or little data and control flow tracing to emit warnings. CodePeer performs full data and control flow tracing as explained in How does CodePeer Work?.
  • Compiler warnings operate only on a single subprogram (or in the best cases, single unit), while CodePeer does interunit analysis and propagates information from one unit to the other.
  • Runtime related messages in CodePeer are designed to be sound (messages reported with high rank are certain) and complete (all possible errors are reported), so CodePeer will not miss any possible cases, unlike compiler warnings which are typically based on heuristics. Note that CodePeer has a category of messages identified as “warnings,” which are also designed to be sound (when they do not use the term “might”), but are necessarily incomplete, in the sense that CodePeer cannot, for example, identify all cases of dead (unreachable) code, due to approximations inherent in analyzing loops, arrays, recursion, etc.
  • CodePeer will keep much more precise track of values of e.g. individual fields or array elements and generate more precise messages as a result.

CodePeer also comes with many additional capabilities not available in compilers such as a historical database, the ability to generate contracts for subprograms, etc...

Finally, when using CodePeer you can decide to take advantage of both GNAT warnings and CodePeer messages via the --gnat-warnings switch, see GNAT Warnings Integration for more details.

8.11. Partitioning of Analysis

When analysis a large set of sources and depending on how much memory is available, CodePeer may decide to split the analysis into multiple partitions, consisting of a subset of the sources.

The partitioning is done based on the following criteria:

  1. When using -level 4 or -global, the analysis is always done globally, with a single partition.
  2. When using -level 2 or -level 3, one partition is created for each project file (.gpr), unless -no-project-partitions is used. In other words, if you use a single project file with no dependencies or -level 0/1/4 or -no-project-partitions, then this step is skipped.
  3. Then, based on the size of the sources and the memory available, these partitions may be further split, in alphabetical order. When using -level 3, partitions are only split during this step if there is not enough memory available. When using -level 2 or below, the size of each partition is forced to be small and to include only about a dozen source files (depending on the size of these files). This can be further tuned via the -dbg-partition-limit switch (-level 2 uses -dbg-partition-limit 1000).

Each partition is saved in a file called part-xxx-of-project.library where xxx is the partition number (e.g. 001, 002, etc...).

If one partition is created for each project file (see conditions in case 2 above), these files are regenerated after each run.

In other cases (e.g. -level 4 or -no-project-partitions or use of a single project file) the previous partition files - if found - are reused from one run to another, except that files no longer present in a new run are removed, and new files are appended at the end of the last partition, or in a new partition if the last partition was already full.

Note that the main way for a user to influence the partitioning is to use project files (.gpr) to split the analysis into the desired subset of files.

Finally, it is more likely that a global analysis might run out of memory during the processing of some source file. In that case, the source file is skipped and analysis proceeds as though that file was not present in the library, unless memory is really stressed too far, in which case the entire analysis may fail.

8.12. CodePeer Limitations and Heuristics

The most accurate static error detection is possible when the entire program representation is available to CodePeer. There are two reasons why the entire program might not be available: a portion of the program might simply not be available: perhaps it hasn’t been written yet or perhaps it’s from a restricted library; the other reason is that CodePeer might have to partition the program representation into manageable sized partitions.

CodePeer performs static error detection by holding in main memory a representation of the entire subsystem being analyzed. It is possible that the subsystem being analyzed is so large that its representation cannot be contained in the memory of the host machine on which CodePeer is running. CodePeer automatically divides the subsystem being analyzed into partitions that are more likely to be analyzable within the memory constraints of the host machine. See Partitioning of Analysis for more details on how CodePeer decides to partition an analysis.

8.12.1. Evaluation Order

CodePeer assumes that arguments of calls are evaluated in left-to-right order. In practice, compilers are allowed to choose any evaluation order for arguments of calls. GNAT in particular may evaluate call arguments in different orders depending on the target platform. If the order of evaluation matters for the side effects of the call, then CodePeer analysis may not be sound with respect to the behavior observed when executing on the target.

8.12.2. CodePeer Presumptions

Sometimes the entire program is not available to CodePeer, either because the program source is simply not available, or because the program was so large it needed to be partitioned. In either of these cases, important information about a subprogram invocation might not be available to CodePeer at the point of subprogram invocation. CodePeer will not be able to calculate the preconditions of the subprogram being called and it will not be able to precisely characterize the return value nor the effects of calling that subprogram. Invocations of subprograms that haven’t been processed by CodePeer are referred to as unanalyzed subprogram calls.

CodePeer makes presumptions about the return values and other side effects of unanalyzed subprogram calls. The presumptions of each unanalyzed subprogram call are displayed in the subprogram information block of the calling subprogram. The source line number of the unanalyzed subprogram call is shown after an @ sign, and information about its presumed return value or other side effect is displayed.

The actual unanalyzed call might be directly on the line specified in the presumption (by the @nnn) or it might be a call that occurs inside the subprogram called on the given line. You can tell if it’s an indirect call or a direct call by the name given in the presumption. If it doesn’t correspond to the name of the subprogram directly called on the given line, then it must be something called indirectly.

Sometimes the presumptions made by CodePeer are overly “optimistic”, such as a given call will never return a null, when in fact it can return a null. For example, suppose that under some circumstances, an unanalyzed subprogram call can return a null, but that the calling subprogram never checks for this possibility, and instead dereferences the return value without checking. This could cause an exception when the program is run, and implies that the program being analyzed contains a latent defect. If CodePeer makes the presumption that the called subprogram does not return null, then it would not identify this latent defect. For this reason, the programmer should check the presumptions about unanalyzed subprogram calls and verify that they are appropriate.

Alternatively, the presumptions made by CodePeer might be “insufficient”. For example CodePeer might presume that some unanalyzed subprogram call could return null, when in fact it never does. Later, in some use of the returned value, far away from the unanalyzed call, CodePeer might complain that a possibly-null value is being dereferenced. The original presumption was insufficient in this case, because it didn’t match the actual behavior of the unanalyzed call, and thereby led to the complaint. The programmer can eliminate such a complaint by adding code immediately after the point of the call to assert that the returned value is not null. This will cause CodePeer to tighten up its presumption about the unanalyzed call to satisfy the assertion, and the complaints at other points in the code will go away. For example:

Result := Call_To_Unknown_Function (...);
pragma Assert (Result /= null);

CodePeer currently has a limitation on unanalyzed calls: it presumes that unanalyzed calls only update fields that are uninitialized, 0, -1, or null prior to the call, and referenced after the call. However, if a field is explicitly initialized, to one for example, then the unanalyzed call is presumed to have no effect on that field, even if a reference after the call expects the field to have been updated (for example, to some other value). The net result is that a high ranking message may be generated at the point of reference after the unanalyzed call. One work-around is to include the package whose subprogram is reached by the unanalyzed call within the CodePeer library, even if it is normally not being analyzed.

If you do not like CodePeer’s default handling of presumptions, you can disable this behavior via the -no-presumptions switch, which is enabled also at -level 4.

8.12.3. Handling of Generic Units

CodePeer does not analyze generic units per se, since it doesn’t have sufficient context in order to perform a precise enough analysis of the generic code (since it is dependent on its generic parameters which can change significantly the code generated).

Instead, CodePeer analyzes each instantiation of generic units. This means in particular that if you do not instantiate a generic unit, it won’t be analyzed. In addition if there are multiple instantiations of a single generic unit, CodePeer will analyze each instantiation separately.

Any messages generated during the analysis of one instantiation might very likely be similar or identical to those generated for other instantiations. The net effect is that there may be multiple similar messages associated with a single line in the source listing for a generic unit.

In addition, when using e.g. the –no-subprojects switch (analyze only sources from the root project), since instantiations are analyzed in the context of the client code (where the instantiation is located), you will get an analysis and potentially messages related to source files located outside the root project.

8.12.4. Loop Unrolling

In some cases, CodePeer may decide that it is worthwhile to unroll a loop when generating the SCIL representation of the loop. For example, given Ada source code

for I in 1 .. 3 loop
   A (I) := I * 100;
end loop;

CodePeer might (or might not) choose to represent this in the generated SCIL as something more like

A (1) := 100;
A (2) := 200;
A (3) := 300;

In most cases, this sort of thing is an internal implementation decision which the user need not be aware of. However, CodePeer might then go on to emit messages which pertain to one specific iteration of the loop (or, as is discussed later, to iterations other than the first iteration). In such cases, CodePeer will include additional text in each message in order to unambiguously identify the subject of the message. This additional text may take either of two forms. In the case of a for loop (or of an implicitly looping construct such as a quantified expression) with static bounds for which CodePeer decides full unrolling is desirable, text such as “(iteration 3 of 5)” would be included. In the case of a loop for which only the first iteration (or perhaps only some part of the first iteration, such as the initial test condition evaluation for a while loop) is unrolled, text such as “(initial iteration)” or “(subsequent iterations)” would be included. In either case, such additional text refers to the nearest enclosing loop (or implicitly looping construct).

For example, given the subprogram

function Foo return Integer is
   type Vec is array (1 .. 3) of Integer;
   Nums : Vec := (1, 2, 3);
   Dens : Vec := (-1, 0, 1);
   Result : Integer := 0;
   for I in Vec'Range loop
      Result := Result + (Nums (I) / Dens (I));
   end loop;
   return Result;
end Foo;

CodePeer generates the message:

foo.adb:8:38: high: divide by zero fails here: requires Dens(I) /= 0 (iteration 2 of 3)

In order to decide whether or not to unroll a loop, CodePeer uses some heuristics and in particular limits the number of generated statements after unrolling. This limit is 128 statements by default, and can be changed by setting the environment variable CODEPEER_UNROLL_LIMIT to a lower or greater value. Setting this environment variable to 0 will in effect disable loop unrolling. Note that this setting is taken into account when generating SCIL files, so you need to force generation of the SCIL file after enabling it, by e.g. using the CodePeer menu CodePeer>Advanced>Remove SCIL

8.12.5. Calls to Unknown Subprograms

If CodePeer encounters calls to subprograms whose body is not available (e.g. imported from another language, or a runtime call not known to CodePeer), it will report such calls as part of the informational messages.

Calls to these unknown subprograms need to be reviewed manually to ensure that the rest of the analysis is reliable.

8.12.6. Skipped Subprograms or Files

The static error detection analysis performed by CodePeer is space- and time-intensive, because CodePeer tracks the value of all program expressions across all possible execution paths. The present implementation of CodePeer holds all this information in main memory on the host machine.

Certain Ada programs are too large or too complex to analyze within the memory constraints of the host machine. When CodePeer detects this circumstance, it will skip analyzing the subprograms or files which are too complex. No annotations or error messages will be generated for these subprograms or files, and calls to subprograms which were skipped will be analyzed as though the subprogram body were not available. Detailed information about which files were skipped is available in the info messages and may also be found in the low level Inspection.log file.

This kind of limitation is inherent to static analysis, and as for calls to unknown subprograms, calls to these skipped subprograms need to be reviewed manually instead.

As an example, certain syntax parsers can be too complex to analyze within the memory constraints of the host machine. The parsing routine will typically contain a loop and deeply nested switch or if-then-else statements with complex conditional expressions. These can introduce a very large number of paths and distinct computations into the program graph, which may bump into the host machine memory limit.

If you want CodePeer to analyze these subprograms, you can try one of the following solutions:

  • increase the analysis timeout: by default (at level 3), the analysis time for each subprogram is limited to 10 minutes. You can increase this value by using the -method-timeout switch.
  • increase the amount of memory available on your machine: in case of a memory error reported by CodePeer, runing on a machine with more main memory should help analyze complex subprograms.
  • increase the maximum number of values computed: in case CodePeer reports a too many value numbers in Inspection.log, then you might want to use the debug switch -dbg-vn-limit with a value of 60000 or more, e.g: -dbg-vn-limit 60000. If this is not sufficient, you will need to simplify your subprogram, see below.
  • simplify your subprogram: when CodePeer reports an inability to analyze a subprogram, it often means that the subprogram is too complex to be analyzed, and should be split into multiple, simpler subprograms which CodePeer can analyze separately.

Note that increasing one of the above limits may well run into another limit due to the inherent complexity of these subprograms (e.g. going from a “too many value numbers* limit to a timeout limit, to an out of memory limit.

In general our recommendation is to either analyze such complex subprograms manually and let CodePeer skip them, or alternatively refactor your code as mentioned above by e.g. splitting it into multiple simpler subprograms.

Since CodePeer’s analysis is modular, the rest of the code will still be analyzed and produce useful and relevant results.

8.12.7. Dispatching Calls and Access to Subprograms

When CodePeer encounters a dispatching call or a call via an access to subprogram variable, CodePeer attempts to identify all possible such target calls by considering all the compatible primitive operations (or subprograms whose address was taken and whose profile is compatible in the case of an access to subprogram object). By default, the effect of these calls is taken into account in the caller, but the preconditions for these multiple targets (which may correspond to subprograms that cannot be called at this point) are ignored, to avoid an undue number of false positives. The net effect is that CodePeer may not report certain possible precondition failures for multi-target calls.

When -messages max is used (which is also implied by -level max), then CodePeer checks that the preconditions of all targets are satisfied at the point of call.

8.12.8. Null Array Bounds

In most cases, Ada requires that the high and low bounds of an index constraint for an array must belong to the corresponding index subtype of the array type. For example, because Standard.String is declared as:

type String is array (Positive range <>) of Character;

both bounds of an object or value of type String will usually be positive. There is an exception to this rule in the case of an array which has no elements because the low bound is greater than the high bound. In this case, values outside of the index subtype are permitted. It is not at all unusual to see high bounds outside of the index subtype, as in:

X : String(1 .. 0);

and CodePeer handles this case. It is much more unusual to see low bounds outside of the index subtype, as in:

Unlikely  : String (-10 .. -20);

It happens that the possibility of an out-of-index-subtype lower bound significantly impairs CodePeer’s analysis in some cases that do commonly come up in practice. In order to pragmatically deal with this situation, CodePeer will, in some cases, treat an out-of-index-subtype lower bound as an error. An array declaration such as the “Unlikely” example above will elaborate without raising any exception, as specified by the Ada language definition. CodePeer, however, may treat it as an error. More importantly, in cases where the bounds of array are not known (e.g., a subprogram with a parameter of an unconstrained array subtype such as String), CodePeer may assume for purposes of its analysis that the low bound belongs to the index subtype. Given the following example:

procedure P (X : String) is
   First_Is_Positive : Boolean := X'First > 0;
   Last_Is_Positive  : Boolean := X'Last > 0;

CodePeer will assume that the local variable First_Is_Positive is initialized to True, whereas it knows that Last_Is_Positive might be False if X’Length = 0.

8.12.9. Time Limitations

In certain cases, CodePeer will not be able to analyze a subprogram within the time boundaries allocated. This can be detected by looking in the log file Inspection.log, where the following messages may appear:

<file>:<line>:<col>:<subp>: tool limitation: subprogram timeout: ...

The above shows that a timeout was hit during the analysis of a given subprogram, e.g. that the analysis of this subprogram didn’t complete in time. The presence of such messages means that the analysis is not complete.

This may be problematic when performing impact analysis and in the (rare) cases where this timeout happens randomly. This may be due to various factors, for example the machine load. In order to fix this problem, various approaches can be attempted:

  • make sure that no other process is running on the computer when CodePeer is running.
  • try to replace the hardware by a more powerful system.
  • increase the time limit for subprogram analysis, through the -method-timeout <time in seconds> switch (600 by default at level 3).
  • lower the size of the partition through the -dbg-partition-limit switch.
  • refactor the problematic subprogram and break it into smaller procedures should also improve results.
  • add a pragma Annotate (CodePeer, Skip_Analysis); at the beginning of the subprogram body, to ensure that CodePeer will systematically skip analyzing this subprogram, ensure a consistent behavior across runs. See Partial Analysis for more details.

Note that, a subprogram that systematically issues a timeout error can be ignored as long as you do not need to perform a complete analysis. In other words, only subprograms which are sometimes analyzed, and sometimes not due to a timeout can cause confusing results.