Version 7.0
Copyright © 2021 Lowell D. Thomas
APG
… an ABNF Parser Generator
Introduction

Disclamer: I am not a Computer Scientist and what follows here and throughout is not a study in Formal Language Theory. It simply follows the definition of ABNF to a natural means of parsing the phrases it defines.

APG – an ABNF Parser Generator – was originally designed to generate recursive-descent parsers directly from the ABNF grammar defining the sentences or phrases to be parsed. The approach is to recognize that ABNF defines a tree with seven types of nodes and that each node represents an operation that can guide a depth-first traversal of the tree – that is, a recursive-descent parse of the tree.

However, APG has since evolved from parsing the strictly Context-Free languages described by ABNF in a number of significant ways.

The first is through disambiguation. A Context-Free language sentence may have more than one parse tree that correctly parses the sentence. That is, different ways of phrasing the same sentence. APG disambiguates, always selecting only a single, well-defined parse tree from the multitude.

From here it was quickly realized that this method of defining a tree of node operations did not in any way require that the nodes correspond to the ABNF-defined tree. They could be expanded and enhanced in any way that might be convenient for the problem at hand. The first expansion was to add the "look ahead" nodes. That is, operations that look ahead for a specific phrase and then continue or not depending on whether the phrase is found. Next nodes with user-defined operations were introduced. That is, a node operation that is hand-written by the user for a specific phrase-matching problem. Finally, to develop an ABNF-based pattern-matching engine similar to regular expressions, regex, a number of new node operations have been added: look behind, back referencing, and begin and end of string anchors.

These additional node operations enhance the original ABNF set but do not change them. Rather they form a superset of ABNF, or as is referred to here, SABNF.

Today, APG is a versatile, well-tested generator of parsers. And because it is based on ABNF, it is especially well suited to parsing the languages of many Internet technical specifications. In fact, it is the parser of choice for several large Telecom companies. Versions of APG have been developed to generate parsers in C/C++, Java and JavaScript.

Version 7.0 is a complete re-write, adding a number of new features.

  • written entirely in C but with objects and exception handling
  • generates parsers for 8-, 16, 32-, and 64-bit wide alphabet characters
  • exposes an Application Programming Interface, API, which allows parser generation on the fly within the user's custom code
  • speeds parsing with Partially-Predictive Parsing Tables (PPPT) – first introduced in version 5.0 in 2007, but never continued in future versions or languages
  • adds a pattern-matching engine, apgex, with more features and more power than regex
    • full recursion for matching deeply nested pairs
    • two modes of back referencing (introduces a new parent mode back referencing)
    • handwritten pattern-matching code snippets for difficult, non-context-free phrases
    • named phrases for easy identification and referencing
    • replaces cryptic regex syntax with easy-to-read SABNF grammars
    • exposes the parsed AST for complex phrase translations
    • optionally traces the parse tree for debugging grammars and phrases
  • includes an RFC8259-compliant JSON parser and builder
  • includes a standards-compliant XML, non-validating parser
  • includes a number of utilities, commonly needed and used by parsing applications:
    • data encoding and decoding – base64, UTF-8, UTF-16, UTF-32
    • display of unprintable, non-ASCII data in hexdump-like format
    • line separation and handling for ASCII and Unicode data
    • a message logging facility
    • plus a large tool chest of utility functions for system information, pretty printing of APG information and more
  • includes a large set of examples demonstrating all aspects of use
APG Version 7.0 is licensed under the 2-Clause BSD License,
an Open Source Initiative Approved License.