Uniform Resource Locators (URL - rfc1738),
also informally called "web addresses", are able to describe the
name and location of a resource. A URL scheme, such as http
,
identifies the method used to access the resource. A URL host, such as www.boost.org
, is
used to identify where the resource is located. The
interpretation of a URL might depend on scheme-specific requirements.
Table 1.1. Example: URLs
URL |
Scheme |
Host |
Resource |
---|---|---|---|
|
|
|
|
|
|
|
|
URLs are often compared to Uniform Resource Names (URN - rfc1738),
a scheme whose primary purpose is labeling resources with location-independent
identifiers. URNs, as other schemes, have their own syntax. The scheme urn:
is
reserved to URNs, which do not specify how to locate a resource:
Table 1.2. Example: URN
URN |
Resource |
Namespace |
Identifier |
---|---|---|---|
|
|
|
|
Uniform Resource Identifiers (URI - rfc3986) define a general scheme-independent syntax for references to abstract or physical resources. The initial URI specification (rfc2396) described them as either URLs and URNs (rfc2396 section 1.2). The current specifications (rfc3986) refer to this hierarchy as the Classical View (rfc3305, Section 2.1) of URI partitioning:
Table 1.3. URIs: Classical View
URI |
Category |
---|---|
|
URL |
|
URL |
|
URL |
|
URN |
The following are examples of invalid URIs:
Table 1.4. Invalid URIs
Component |
Example |
Note |
---|---|---|
Protocol-Relative Link (PRL) |
|
Missing scheme. |
|
|
Missing scheme. Missing |
The Classical View of URI partitioning, where a URI is either a URI or a URL, caused enough confusion to justify a specification about URI partitioning (rfc3305).
Common sources of confusion in the Classical View were:
Thus, the URL/URN hierarchy became less relevant and the Contemporary View of URI partitioning (rfc3305, Section 2.2) is now that:
uri:
scheme is one of many possible URI schemes.
urn:
namespaces are URN subspaces.
In this view, the terms URLs and URIs have the same grammar and are used interchangeably in that regard.
Table 1.5. URLs (or URIs): Contemporary View
Example |
Scheme |
Host (Locator Component) |
Path (Name Component) |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The Contemporary View has been endorsed by rfc3305 (Section 5), and has been in use in all other specifications since then, including the current URI grammar (rfc3986, Section 1.1.3).
Although URIs and URLs have the same grammar, it's often useful to standardize on one of these terms. Recent RFC documents standardize on the term URI rather than the most restrictive term URL. However, the term URL is almost omnipresent in any other contexts for being more specific, which provides more communication clarity.
This library also adheres to this Contemporary View of URI partitioning and standardizes on the term "URL".
Following the syntax in rfc3986, a single algorithm is used for URLs, URIs and IRIs. When discussing particular grammars, its rules are presented exactly as it appears in the literature.
A URL string can be parsed using one of the parsing functions.
Table 1.6. Parsing Functions
Function |
Grammar |
Example |
Notes |
---|---|---|---|
|
Supports fragment |
||
|
Does not support fragment |
||
|
Does not require scheme |
||
|
Any |
The library uses the convention that each function parse_<component>
operates according to the particular
grammar rule <component>
specified in rfc3986.
The document inherits from rfc2396,
where there are no URL
,
absolute-URL
, URL-reference
rules. Thus, for consistency, the main parsing functions also make reference
to uri
s rather than url
s.
The collective grammars parsed by these algorithms are specified below.
absolute-URI = scheme ":" hier-part [ "?" query ] relative-ref = relative-part [ "?" query ] [ "#" fragment ] URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] URI-reference = URI / relative-ref hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty relative-part = "//" authority path-abempty / path-absolute / path-noscheme / path-empty
The following is an example URI and its main parts:
foo://example.com:8042/over/there?name=ferret#nose \_/ \______________/\_________/ \_________/ \__/ | | | | | scheme authority path query fragment
For the complete specification please refer to rfc3986:
Note | |
---|---|
This documentation refers to the Augmented Backus-Naur Form (ABNF) notation of rfc2234 to specify particular grammars used by algorithms and containers. While a complete understanding of the notation is not a requirement for using the library, it may help for understanding how valid components of URLs are defined. In particular, this will be of interest to users who wish to compose parsing algorithms using the combinators provided by the library. |
All parsing functions accept a string_view
and return a
.
The following example parses a string literal containing a URI:
result
<url_view
>
urls::result< urls::url_view > r = urls::parse_uri( "https://www.example.com/path/to/file.txt" ); if( r.has_value() ) // parsing was successful { urls::url_view u = r.value(); // extract the urls::url_view std::cout << u; // format the URL to cout } else { std::cout << r.error().message(); // parsing failure; print error }
While the parsing function refers to the URI
grammar rule, the result refers to a url_view
.
The convention parse_<component>
produces parse_uri
for the
URI
grammar rule defined
in rfc3986. However,
as the library adheres to the Contemporary
View of URI partitioning and standardizes on the term "URL",
it makes reference to the term "URL" elsewhere.
When the input does not match the URL grammar, the error is reported through
a
.
The result in a variant-like object which holds a result
<url_view
>url_view
or an error_code
in the case where the
parsing failed. Note that like a string view, the URL view does not own the
underlying character buffer. Instead, it references the string passed to
the parsing function. The caller is required to ensure that the lifetime
of the string extends until the view is destroyed.
The function url_view::collect
may be used to create a
copy of the underlying character buffer and attach ownership of the buffer
to a newly returned view, which is wrapped in a shared pointer. The following
code calls collect
to create
a read-only copy:
// This will hold our copy std::shared_ptr<urls::url_view const> sp; { std::string s = "/path/to/file.txt"; // result::value() will throw an exception if an error occurs urls::url_view u = urls::parse_relative_ref( s ).value(); // create a copy with ownership and string lifetime extension sp = u.collect(); // At this point the string goes out of scope } // but `*sp` remains valid since it has its own copy std::cout << *sp << "\n";
The interface of url_view
decomposes the URL into
its individual parts and allows for inspection of the various parts as well
as returning metadata about the URL itself. These non-modifying observer
operations are described in the sections that follow.
To create a mutable copy of the url_view
, one can just create a
url
:
// This will hold our mutable copy urls::url v; { std::string s = "/path/to/file.txt"; // result::value() will throw an exception if an error occurs v = urls::parse_relative_ref(s).value(); // At this point the string goes out of scope } // but `v` remains valid since it has its own copy std::cout << v << "\n"; // and it's mutable v.set_encoded_fragment("anchor"); std::cout << v << "\n";
In many places, functions in the library have a return type which uses the
result
alias template. This class allows the parsing algorithms to report errors
without referring to exceptions.
The functions result::has_value
and result::has_error
can be used to check if the result contains an error.
urls::result< urls::url_view > r = urls::parse_uri( "https://www.example.com/path/to/file.txt" ); if( r.has_value() ) // parsing was successful { urls::url_view u = r.value(); // extract the urls::url_view std::cout << u; // format the URL to cout } else { std::cout << r.error().message(); // parsing failure; print error }
This ensures result::value
will not throw an error. In contexts
where it is acceptable to throw errors, result::value
can be used directly.
Check the reference for result
for a synopsis of the type.
For complete information please consult the full result
documentation in Boost.System.