Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Parse Rules

A Rule is an object which tries to match the beginning of an input character buffer against a particular syntax. It returns a result containing a value if the match was successful, or an error_code if the match failed. Rules are not invoked directly. Instead they are passed as values to a parse function, along with the input character buffer to process. The first overload requires that the entire input string match, otherwise else an error occurs. The second overload advances the input buffer pointer to the first unconsumed character upon success, allowing a stream of data to be parsed sequentially:

template< class Rule >
auto parse( string_view s, Rule const& r) -> result< typename Rule::value_type >;

template< class Rule >
auto parse( char const *& it, char const* end, Rule const& r) -> result< typename Rule::value_type >;

To satisfy the Rule concept, a class or struct must declare the nested type value_type indicating the type of value returned upon success, and a const member function parse with a prescribed signature. In the following code we define a rule that matches a single comma:

struct comma_rule_t
{
    // The type of value returned upon success
    using value_type = string_view;

    // The algorithm which checks for a match
    result< value_type >
    parse( char const*& it, char const* end ) const
    {
        if( it != end && *it == ',')
            return string_view( it++, 1 );

        return error::mismatch;
    }
};

Since rules are passed by value, we declare a constexpr variable of the type for syntactical convenience. Variable names for rules are usually suffixed with _rule:

constexpr comma_rule_t comma_rule{};

Now we can call parse with the string of input and the rule variable thusly:

result< string_view > rv = parse( ",", comma_rule );

assert( rv.has_value() && rv.value() == "," );

Rule expressions can come in several styles. The rule defined above is a compile-time constant. The unsigned_rule matches an unsigned decimal integer. Here we construct the rule at run time and specify the type of unsigned integer used to hold the result with a template parameter:

result< unsigned short > rv = parse( "16384", unsigned_rule< unsigned short >{} );

The function delim_rule returns a rule which matches the passed character literal. This is a more general version of the comma rule which we defined earlier. There is also an overload which matches exactly one character from a character set.

result< string_view > rv = parse( ",", delim_rule(',') );
Error Handling

When a rule fails to match, or if the rule detects a unrecoverable problem with the input, it returns a result assigned from an error_code indicating the failure. When using overloads of parse which have a character pointer as both an in and out parameter, it is up to the rule to define which character will be pointed to upon error. When the rule matches successfully, the pointer will always be changed to point to the first unconsumed character in the input, or to the end pointer if all input was consumed.

It is the responsibilty of library and user-defined implementations of compound rules (explained later) to rewind their internal pointer if a parsing operation was unsuccessful, and they wish to attempt parsing the same input using a different rule. Users who extend the library's grammar by defining their own custom rules should follow the behaviors described above regarding the handling of errors and the modification of the caller's input pointer.


PrevUpHomeNext