Ranges

Thus far the rules we have examined have one thing in common; the values they produce are fixed in size and known at compile-time. However, grammars can specify the repetition of elements. For example consider the following grammar (loosely adapted from rfc7230):

chunk-ext      = *( ";" token )

The star operator in BNF notation means a repetition. In this case, zero or more of the expression in parenthesis. This production can be expressed using the function range_rule, which returns a rule allowing for a prescribed number of repetitions of a specified rule. The following rule matches the grammar for chunk-ext defined above:

constexpr auto chunk_ext_rule = range_rule(
    tuple_rule( squelch( delim_rule( ';' ) ), token_rule( alnum_chars ) ) );

This rule produces a range, a ForwardRange whose value type is the same as the value type of the rule passed to the function. In this case, the type is string_view because the tuple has one unsquelched element, the token_rule. The range can be iterated to produce results, without allocating memory for each element. The following code:

result< range< string_view > > rv = parse( ";johndoe;janedoe;end", chunk_ext_rule );

for( auto s : rv.value() )
    std::cout << s << "\n";

produces this output:

johndoe
janedoe
end

Sometimes a repetition is not so easily expressed using a single rule. Take for example the following grammar for a comma delimited list of tokens, which must contain at least one element:

token-list    = token *( "," token )

We can express this using the overload of range_rule which accepts two parameters: the rule to use when performing the first match, and the rule to use for performing every subsequent match. Both overloads of the function have additional, optional parameters for specifying the minimum number of repetitions, or both the minimum and maximum number of repetitions. Since our list may not be empty, the following rule perfectly captures the token-list grammar:

constexpr auto token_list_rule = range_rule(
    token_rule( alnum_chars ),
    tuple_rule( squelch( delim_rule( ',' ) ), token_rule( alnum_chars ) ),
    1 );

The following code:

result< range< string_view > > rv = parse( "johndoe,janedoe,end", token_list_rule );

for( auto s : rv.value() )
    std::cout << s << "\n";

produces this output:

johndoe
janedoe
end

In the next section we discuss the available rules which are specific to rfc3986.

These are the rules and compound rules provided by the library. For more details please see the corresponding reference sections.

Table 1.7. Grammar Symbols

Name	Description
`dec_octet_rule`	Match an integer from 0 and 255.
`delim_rule`	Match a character literal.
`literal_rule`	Match a character string exactly.
`not_empty_rule`	Make a matching empty string into an error instead.
`optional_rule`	Ignore a rule if parsing fails, leaving the input pointer unchanged.
`range_rule`	Match a repeating number of elements.
`token_rule`	Match a string of characters from a character set.
`tuple_rule`	Match a sequence of specified rules, in order.
`unsigned_rule`	Match an unsigned integer in decimal form.
`variant_rule`	Match one of a set of alternatives specified by rules.

Ranges

More