Version 7.0
Copyright © 2021 Lowell D. Thomas
APG
… an ABNF Parser Generator
All Data Structures Files Functions Variables Typedefs Macros Pages
Universal vs Parent Mode Back Referencing

A universal mode back reference, \%uA, matches the last occurrence of A regardless of where it occurs in the input source string or on the parse tree. Consider a grammar for HTML-like tags using universal mode.

U   = (%d60 tag %d62) U (%d60.47 \%utag %d62) / %d45.45 CRLF
tag = 1*%%(d97-122 / %d65-90) CRLF
dot_inline_dotgraph_4.png

The input string

<TagA><TagB>--</TagB></TagB>

would have a parse tree figuratively like Figure 1. Notice that the last tag matched was "TagB" at the bottom of the left side of the parse tree. Since universal mode back referencing only matches that last occurrence of the rule "tag" both back references can only match "TagB". This is not what we want for HTML tags.

Let's try this again with parent mode back referencing. We will use the same grammar except use parent mode back references.

P   = (%d60 tag %d62) P (%d60.47 \%ptag %d62) / %d45.45 CRLF
tag = 1*%%(d97-122 / %d65-90) CRLF
dot_inline_dotgraph_5.png

The input string

<TagA><TagB>--</TagB></TagA>

would have a parse tree figuratively like Figure 2. Since parent mode back referencing only matches the last occurrence of the rule "tag" having the same parent as the back reference we get a symmetric matching across the left and right branches of the parse tree. This solves the problem of matching not only the correct pairing of the HTML start and end tags, but the tag names as well.

APG Version 7.0 is licensed under the 2-Clause BSD License,
an Open Source Initiative Approved License.