Version 7.0
Copyright © 2021 Lowell D. Thomas
APG
… an ABNF Parser Generator
apgex - An APG Pattern-Matching Engine

 Introduction
Construction
Defining the SABNF Pattern
Configuring the Pattern Parser
Execution

Properties
Trace - the Debugger
Abstract Syntax Tree - the Matched Phrase AST
Display Helpers

Introduction

apgex is a regex-like pattern-matching engine which uses SABNF as the pattern-defining syntax and APG as the pattern-matching parser. While regex has a long and storied history and is heavily integrated into modern programming languages and practice, apgex offers the full pattern-matching power of APG and the reader-friendly SABNF syntax. It is fully recursive, meaning nested parenthesis matching, and introduces a new parent mode of back referencing, enabling, for example, the matching of names in nested start and end HTML tags.

Though not specifically designed and built to address the regex issues discussed here by Larry Wall, creator of the Perl language, apgex does seem to go a long way toward addressing many of those issues. Most notably back referencing and nested patterns, but other issues as well.

First introduced in JavaScript APG [1] [2] as an alternative to RegExp in 2017, this extends it to the C-language and the updated APG version 7.0. The full documentation can be found in the files apgex.h and apgex.c.
↑top

Construction/Destruction

apgex follows the object model of APG. Pattern-matching objects are created and destroyed with the constructor and destructor functions,

↑top

Defining the SABNF Pattern

There are three functions for defining the SABNF pattern. The apgex pattern-matching parser can be generated on the fly from an SABNF pattern string or file or from a previously constructed SABNF grammar parser.

  • From a pattern string vApgexPattern()
    A valid SABNF grammar is simply defined in a null-terminated string and used to define the pattern.
  • From a pattern file vApgexPatternFile()
    Same as above, but the SABNF grammar exists in a file rather than a string.
  • From a previously constructed parser vApgexPatternParser()
    An APG parser object may be created in advance elsewhere in the application and used here to define the pattern. Any method can be used to generate the parser in advance - from an APG-generated file or on the fly from the APG API.

↑top

Configuring the Pattern Parser

apgex offers several ways to configure the pattern-matching parser prior to execution.

  • flags - See vApgexPattern() for a complete discussion of the available flag options and their effect on the operation of the parser.
  • vApgexEnableRules() - apgex has the ability to capture the sub-phrases matched by the individual SABNF grammr rules and UDTs. By default, all such capture is disabled. Use this function to enable or disable any or all rules and UDTs.
  • vApgexDefineUDT() - If there are any UDTs in the SABNF pattern grammar they must have callback functions assigned prior to any phrase-matching attempt. Use this function to define them.
  • vApgexSetLastIndex() - See the discussion of flags in vApgexPattern() for an explanation of the default values and use of uiLastIndex. The default values can be overridden prior to any phrase-matching attempt with this function.
  • sApgexProperties() - If special configuration of the tracing object is needed, the properties includes a pointer to the trace object. See vTraceConfig() for details. If the AST is requested it may need to be configured before use. The properties includes a pointer to the AST object. See ast.c for details.

↑top

Execution

Once the apgex object is constructed and the pattern has been defined and configured, the search for a matched phrase must be executed. There are four functions which will execute a search. These will be discussed individually in the next four sub-sections.
↑top

Finding a Matched Phrase

sApgexExec()
This is the primary execution function which will generate detailed information about the matched phrase if successful. The results include, in addition to the matched phrase, the "left context", the "right context" and the sub-phrases captured for each of the enabled rule/UDT names. For a detailed description of the results, see the apgex_result structure.
↑top

Test for a Matched Phrase

bApgexTest()
This will execute the parse just the same as sApgexExec() except that no matched phrases are captured. The return is simply true if a match is found and false if not.
↑top

Replace the Matched Phrase

sApgexReplace() and sApgexReplaceFunc()
This will perform a phrase match similar to sApgexExec() except that instead of returning the matched phrase, the source string is returned with the matched phrase replaced by the defined replacement string. There is considerable flexibility in the definition of the replacement string. Anything from a simple string, a string that includes phrases from the matched results, or even a user-defined function with full access to the result and the current state properties. See the function descriptions for details.
↑top

Split - Matched Phrases as Delimiters

spApgexSplit()
This function is modeled after the JavaScript function str.split([separator[, limit]]) when using a regular expression. It will use the matched phrases as delimiters to split the input, source string into an array of sub-strings. See the function description for details.
↑top

Properties

Properties are the current state of the apgex object. In addition to the last phrase match, if any, there are the flags, the original input string and SABNF grammar, pointers to the parser, trace and AST object contexts and other information. See the apgex_properties structure for the properties details. Use sApgexProperties() to get a copy of the properties.
↑top

Trace - the Debugger

When a phrase match doesn't go as expected the problem could be an SABNF grammar error, an error in the input source string or both. The primary debugging tool is a trace – a detailed map of the parser's path through the parser tree with a display of the results for each tree node visit. APG provides just such a tool and it can be activated for the phrase-matching parser. This is done by simply specifying the "t" or "th" flags and defining the macro APG_TRACE when compiling the application. Use vpApgexGetTrace() to get a pointer to the trace object's context. This pointer can then be used to configure the trace. See vTraceConfig()
↑top

Abstract Syntax Tree - the Matched Phrase AST

It may be that a more complex translation of the matched phrases is needed than that provided by the replacement functions. The AST is the ultimate translation tool as discussed in the library section. The full capabilities of the AST library are available for the translation or manipulation of the matched phrases. Use vpApgexGetAst() to get a pointer to the AST object's context.
↑top

Display Helpers

The matched results and properties have a lot of information – sometimes in lists, and even in lists of lists. apgex offers three display helpers for a quick, easy look this data. Phrases and other strings of alphabet characters are displayed as simple, ASCII strings if possible. If they contain non-ASCII characters a format object is used for a hexadecimal display.

↑top

APG Version 7.0 is licensed under the 2-Clause BSD License,
an Open Source Initiative Approved License.