APG
… an ABNF Parser Generator
|
Introduction
Construction
Defining the SABNF Pattern
Configuring the Pattern Parser
Execution
Properties
Trace - the Debugger
Abstract Syntax Tree - the Matched Phrase AST
Display Helpers
apgex
is a regex-like pattern-matching engine which uses SABNF as the pattern-defining syntax and APG as the pattern-matching parser. While regex has a long and storied history and is heavily integrated into modern programming languages and practice, apgex
offers the full pattern-matching power of APG and the reader-friendly SABNF syntax. It is fully recursive, meaning nested parenthesis matching, and introduces a new parent mode of back referencing, enabling, for example, the matching of names in nested start and end HTML tags.
Though not specifically designed and built to address the regex
issues discussed here by Larry Wall, creator of the Perl language, apgex
does seem to go a long way toward addressing many of those issues. Most notably back referencing and nested patterns, but other issues as well.
First introduced in JavaScript APG [1] [2] as an alternative to RegExp in 2017, this extends it to the C-language and the updated APG version 7.0. The full documentation can be found in the files apgex.h and apgex.c.
↑top
apgex
follows the object model of APG. Pattern-matching objects are created and destroyed with the constructor and destructor functions,
There are three functions for defining the SABNF pattern. The apgex
pattern-matching parser can be generated on the fly from an SABNF pattern string or file or from a previously constructed SABNF grammar parser.
apgex
offers several ways to configure the pattern-matching parser prior to execution.
apgex
has the ability to capture the sub-phrases matched by the individual SABNF grammr rules and UDTs. By default, all such capture is disabled. Use this function to enable or disable any or all rules and UDTs.uiLastIndex
. The default values can be overridden prior to any phrase-matching attempt with this function.Once the apgex
object is constructed and the pattern has been defined and configured, the search for a matched phrase must be executed. There are four functions which will execute a search. These will be discussed individually in the next four sub-sections.
↑top
sApgexExec()
This is the primary execution function which will generate detailed information about the matched phrase if successful. The results include, in addition to the matched phrase, the "left context", the "right context" and the sub-phrases captured for each of the enabled rule/UDT names. For a detailed description of the results, see the apgex_result structure.
↑top
bApgexTest()
This will execute the parse just the same as sApgexExec() except that no matched phrases are captured. The return is simply true
if a match is found and false
if not.
↑top
sApgexReplace() and sApgexReplaceFunc()
This will perform a phrase match similar to sApgexExec() except that instead of returning the matched phrase, the source string is returned with the matched phrase replaced by the defined replacement string. There is considerable flexibility in the definition of the replacement string. Anything from a simple string, a string that includes phrases from the matched results, or even a user-defined function with full access to the result and the current state properties. See the function descriptions for details.
↑top
spApgexSplit()
This function is modeled after the JavaScript function str.split([separator[, limit]]) when using a regular expression. It will use the matched phrases as delimiters to split the input, source string into an array of sub-strings. See the function description for details.
↑top
Properties are the current state of the apgex
object. In addition to the last phrase match, if any, there are the flags, the original input string and SABNF grammar, pointers to the parser, trace and AST object contexts and other information. See the apgex_properties structure for the properties details. Use sApgexProperties() to get a copy of the properties.
↑top
When a phrase match doesn't go as expected the problem could be an SABNF grammar error, an error in the input source string or both. The primary debugging tool is a trace – a detailed map of the parser's path through the parser tree with a display of the results for each tree node visit. APG provides just such a tool and it can be activated for the phrase-matching parser. This is done by simply specifying the "t" or "th" flags and defining the macro APG_TRACE when compiling the application. Use vpApgexGetTrace() to get a pointer to the trace object's context. This pointer can then be used to configure the trace. See vTraceConfig()
↑top
It may be that a more complex translation of the matched phrases is needed than that provided by the replacement functions. The AST is the ultimate translation tool as discussed in the library section. The full capabilities of the AST library are available for the translation or manipulation of the matched phrases. Use vpApgexGetAst() to get a pointer to the AST object's context.
↑top
The matched results and properties have a lot of information – sometimes in lists, and even in lists of lists. apgex
offers three display helpers for a quick, easy look this data. Phrases and other strings of alphabet characters are displayed as simple, ASCII strings if possible. If they contain non-ASCII characters a format object is used for a hexadecimal display.
apgex
phrase.