APG
… an ABNF Parser Generator
|
A universal mode back reference, \%uA
, matches the last occurrence of A
regardless of where it occurs in the input source string or on the parse tree. Consider a grammar for HTML-like tags using universal mode.
U = (%d60 tag %d62) U (%d60.47 \%utag %d62) / %d45.45 CRLF tag = 1*%%(d97-122 / %d65-90) CRLF
The input string
<TagA><TagB>--</TagB></TagB>
would have a parse tree figuratively like Figure 1. Notice that the last tag matched was "TagB" at the bottom of the left side of the parse tree. Since universal mode back referencing only matches that last occurrence of the rule "tag" both back references can only match "TagB". This is not what we want for HTML tags.
Let's try this again with parent mode back referencing. We will use the same grammar except use parent mode back references.
P = (%d60 tag %d62) P (%d60.47 \%ptag %d62) / %d45.45 CRLF tag = 1*%%(d97-122 / %d65-90) CRLF
The input string
<TagA><TagB>--</TagB></TagA>
would have a parse tree figuratively like Figure 2. Since parent mode back referencing only matches the last occurrence of the rule "tag" having the same parent as the back reference we get a symmetric matching across the left and right branches of the parse tree. This solves the problem of matching not only the correct pairing of the HTML start and end tags, but the tag names as well.