Version 7.0
Copyright © 2021 Lowell D. Thomas
APG
… an ABNF Parser Generator
The Parser Generator API

The Application Programming Interface (API) follows the same object model used throughout APG. Once an object has been created, the parser generation takes place in six distinct phases with an optional, seventh phase for PPPT generation. Each phase checks to verify that all required previous phases have been completed before continuing. If any fatal errors are discovered in any phase, an exception is thrown back to the application's catch block. Access to the list of messages is available through it's message log with vpApiGetErrorLog().

Note that the generator uses the AST library feature and any application using this API will need to be compiled with the macro APG_AST defined.

Input

Generation begins with an SABNF grammar. This may be obtained from a file, cpApiInFile(), or a string, cpApiInString(). These functions may be called multiple times, interchangeably. After the first call, each successive call concatenates its input to that of the previous calls.

Character Set Validation

The first step is to validate the input grammar characters, vApiInValidate(). SABNF is fully described by the printing ASCII character set, plus tab, line feed and carriage return. That is, the character set [9, 10, 13, 32-126]. This step simply verifies that there are no input characters outside this set.

Note: If the last line of the grammar has no line ending, CRLF, CR or LF, this will generate a character set validation error. Technically, this is a syntax error, but it is quickly and most easily discovered here.

Syntax Validation

The syntax phase parses the input SABNF grammar, verifies that the syntax is correct and generates an AST for translation. A syntax error might be, for example, a malformed rule name or an invalid equal sign. e.g.

1bad_rule_name =: elements CRLF

Semantic Validation and Translation

The semantic phase checks for any semantic errors and translates the grammar syntax into the rule, UDT and opcode information required by an APG parser. Some examples of rules that are syntactically correct but have semantic errors would be

rule1 = %d57-48  CRLF
rule2 = 3*2"ABC" CRLF
rule3 = 0*0"ABC" CRLF
rule4 = 0"ABC"   CRLF

rule1 has an inverted range definition. The range must be from lowest to highest.
rule2 has an inverted minimum and maximum number of repetitions. minimum <= maximum is required.
rule3 and rule4 are specifically disallowed by APG. In theory they could be interpreted as an empty string. However, they have the wasteful feature of requiring a repetition operand that will never be used. APG allows only the literal string, "", as an explicit designation of an empty string

Attributes Validation

It is well known that recursive-descent parsers will fail if a rule is left recursive. Besides left recursion, there are a couple of other fatal attributes that need to be disclosed as well. There are several non-fatal attributes that are of interest also. This module will determine six different attributes listed here with simple examples.

fatal attributes

; left recursion    CRLF
S = S "x" / "y"     CRLF
; cyclic            CRLF
S = S               CRLF
; infinite          CRLF
S = "y" S           CRLF

non-fatal attributes (but nice to know)

; nested recursion  CRLF
S = "a" S "b" / "y" CRLF
; right recursion   CRLF
S = "x" S / "y"     CRLF
; empty string      CRLF
S = "x" S / ""      CRLF

Note that these are “aggregate” attributes, in that if the attribute is true it only means that it can be true, not that it will always be true for every input string. In the simple examples above the attributes may be obvious and definite – always true or false. However, for a large grammar with possibly hundreds of rules and parse tree branches, it can be obscure which branches lead to which attributes. Furthermore, different input strings will lead the parser down different branches. It is for this reason that a given attribute may reveal itself or not.

The attribute phase will determine each of these six attributes for each rule in the grammar. Additionally, it will identify rule dependencies and mutually-recursive groups. For example,

S = "a" A "b" / "y" CRLF
A = "x"             CRLF

S is dependent on A but A is not dependent on S.

S = "a" A "b" / "c" CRLF
A = "x" S "y" / "z" CRLF

S and A are dependent on one another and are mutually recursive.

A more detailed discussion of attributes and how they are computed can be found here.

Attributes Optional

The attribute phase can be bypassed if necessary. While there are at this time no known bugs in the APG version 7.0 attributes algorithm some, but not all, previous versions did have some problems with large grammars (400+ rules). Therefore, if you think parser generation is being prevented with incorrect attribute evaluations, the option to bypass is here.

Bypass attributes at the risk of building a faulty parser.

PPPT Generation (optional)

The parser generator has the ability to generate Partially-Predictive Parsing Table (PPPT) maps which can greatly reduce the work of a parser. Be sure to understand the limits on PPPTs. Some exploratory work may be necessary to determine if they are right for any given grammar. If PPPT maps are not needed, it is a good idea to also define the macro APG_NO_PPPT when compiling the application. It will reduce the amount of code needed and used by the parser.

Composite Generation

Without giving up too much specificity, it is possible to process all of the above phases with a single function. vApiFile() for a grammar file or vApiString() for a grammar string. Using these functions will impose a couple of restrictions:

  • The entire grammar must be in a single file or string.
  • It is not possible to skip the attributes phase.

But within these restrictions, these functions can save some application coding.

Output

Once the grammar has been processed, whether explicitly or with one of the composite functions, there are then two methods for generating a parser object, vApiOutput() and vpApiOutputParser().

Grammar Files
vApiOutput() will generate two grammar files. A base name is provided, say mygrammar, and two files defining the rules, UDTs, opcodes and all necessary supporting data will be generated – mygrammar.h and mygrammar.c. To create a parser object from these file, mygrammar.h must be included with the application and mygrammar.c must be compiled with it. A parser can then be constructed using the data pointer vpMygrammarInit supplied in mygrammar.h.

Note that the base name, mygrammar, is used as a namespace, both with the initialization pointer and with macros provided in mygrammar.h for easy access to the rule identifiers or indexes. This allows multiple parsers to exist in a single application.

Parser Object
vpApiOutputParser() will return a pointer to a parser object directly. The parser object has its own memory space and is completely independent of the API object that created it. The API object may even be destroyed after parser construction without any effect on the generated parser object.

However, they do share a the single exception structure that the application used for the API object creation. Fatal errors in the parser will throw exceptions back to the application's catch block .

APG Version 7.0 is licensed under the 2-Clause BSD License,
an Open Source Initiative Approved License.