The path contains data, usually organized hierarchically, which is combined with the query to identify a resource within the scope of the scheme and authority.
Most schemes interpret the path as a sequence of slash delimited segments. These segments can map to file system paths, which is useful for file servers, but do not always need to imply this relationship.
In addition to interacting with the path as a single string, the library provides container adaptors modeling ranges of individual path segments.
The URL below contains a path /path/to/file.txt
with the three segments path
, to
,
and file.txt
:
http://www.example.com/path/to/file.txt
Depending on the type of URL, there are various syntactic rules for how the path may be formulated in a URL. The BNF for these formulations is defined:
Table 1.16. Path BNF
path = path-abempty ; begins with "/" or is empty / path-absolute ; begins with "/" but not "//" / path-noscheme ; begins with a non-colon segment / path-rootless ; begins with a segment / path-empty ; zero characters path-abempty = *( "/" segment ) path-absolute = "/" [ segment-nz *( "/" segment ) ] path-noscheme = segment-nz-nc *( "/" segment ) path-rootless = segment-nz *( "/" segment ) path-empty = 0<pchar> |
The functions for interacting with the path in a url_view
are as follows:
Table 1.17. Path Observers
Function |
Description |
---|---|
Return the path as a percent-encoded string |
|
Return the path segments as a read-only container of percent-encoded strings. |
|
Return the path segments as a read-only container of strings with percent-decoding applied. |
A URL path is usually interpreted as segments. The library provides two read-only containers for interacting with the segments in a URL's path:
Table 1.18. Segment View Types
Type |
Description |
---|---|
A read-only forward range of path segments returned as percent-encoded strings. |
|
A read-only forward range of path segments returned as strings with percent-decoding applied. |
These views can be directly created by the parsing functions below. This provides the guarantee that all constructed views contain valid path segments:
The function encoded_path
can be used to obtain the path from a url_view
:
Code |
Output |
---|---|
urls::string_view s = "https://www.boost.org/doc/libs/"; urls::url_view u = urls::parse_uri(s).value(); std::cout << u << "\n" << "path: " << u.encoded_path() << "\n" << "encoded segments: " << u.encoded_segments() << "\n" << "segments: " << u.segments() << "\n"; |
https://www.boost.org/doc/libs/ path: /doc/libs/ encoded segments: /doc/libs/ segments: /doc/libs/ |
These functions do not throw. There is no function analogous to has_path
because all URLs have valid paths, even when the path is empty.
Code |
Output |
---|---|
urls::string_view s = "https://www.boost.org"; urls::url_view u = urls::parse_uri(s).value(); std::cout << u << "\n" << "path: " << u.encoded_path() << "\n" << "encoded segments: " << u.encoded_segments() << "\n" << "segments: " << u.segments() << "\n"; |
https://www.boost.org path: encoded segments: segments: |
Notice that there is also no decoded counterpart for encoded_path
.
The reason is any decoded character /
could form an ambiguous
path segment.
These containers are lightweight references to the underlying path string. Ownership of the string is not transferred; the caller is responsible for ensuring that the lifetime of the string extends until the container is destroyed.
Code |
Output |
---|---|
urls::string_view s = "https://www.boost.org/doc/libs"; urls::url_view u = urls::parse_uri(s).value(); std::cout << u.encoded_segments().size() << " segments\n"; for (auto seg: u.encoded_segments()) { std::cout << "segment: " << seg << "\n"; } |
2 segments segment: doc segment: libs |
In contexts where a path can appear by itself, such as HTTP requests, segment
views may not be constructed directly from strings. Instead, we can use the
analogous function parse_path
to obtain a segments_encoded_view
or segments_view
.
Code |
Output |
---|---|
urls::string_view s = "/doc/libs"; urls::segments_encoded_view p = urls::parse_path(s).value(); std::cout << "path: " << p << "\n"; std::cout << p.size() << " segments\n"; for (auto seg: p) { std::cout << "segment: " << seg << "\n"; } |
path: /doc/libs 2 segments segment: doc segment: libs |
A path can be absolute or relative. An absolute path begins with /
:
URL |
Path Type |
---|---|
urls::url_view u = urls::parse_uri("https://www.boost.org").value(); |
Relative path |
urls::url_view u = urls::parse_uri("https://www.boost.org/").value(); |
Absolute path |
The complete path segments "." and ".." are intended only for use within relative references (rfc3986 sec. 4.1) and are removed as part of the reference resolution process (rfc3986 sec. 5.2). Normalizing a URI resolves these dot-segments (rfc3986 sec. 5.2.4).
URL |
Normalized URL |
Path |
---|---|---|
urls::url u = urls::parse_uri("https://www.boost.org/./a/../b").value(); u.normalize(); |
|
Absolute path |
These rules imply a path with the prefix ":"
or "/"
could be in
conflict with the scheme and authority components of the URL, since they
end with these characters. For instance, attempting to create a path with
the prefix //
, i.e. a path whose
first segment is empty, could be interpreted as an empty authority:
URL |
Authority |
Path |
---|---|---|
// scheme and a relative path urls::url_view u = urls::parse_uri("https:path/to/file.txt").value(); |
(no authority) |
Relative path |
// scheme and an absolute path urls::url_view u = urls::parse_uri("https:/path/to/file.txt").value(); |
(no authority) |
Absolute path |
// "//path" will be considered the authority component urls::url_view u = urls::parse_uri("https://path/to/file.txt").value(); |
|
Absolute path |
Likewise, attempting to create a relative path whose first segment contains
a ":"
could be interpreted
as another scheme and a path:
URL |
Scheme |
Path |
---|---|---|
// only a relative path urls::url_view u = urls::parse_uri_reference("path-to/file.txt").value(); |
(no scheme) |
Relative path |
// "path:" will be considered the scheme component // instead of a substring of the first segment urls::url_view u = urls::parse_uri_reference("path:to/file.txt").value(); |
|
Relative path |
Modifying functions will properly adjust paths with malleable null prefixes so that paths maintain their semantics without conflicting with the scheme or authority components:
Code |
URL |
Path |
---|---|---|
// "path" should not become the authority component urls::url u = urls::parse_uri("https:path/to/file.txt").value(); u.set_encoded_path("//path/to/file.txt"); |
|
Absolute path |
// "path:to" should not make the scheme become "path:" urls::url u = urls::parse_uri_reference("path-to/file.txt").value(); u.set_encoded_path("path:to/file.txt"); |
|
Relative path |
Given relative or absolute path, note that all algorithms preserve the path semantics in lossless round-trip conversions between the URL path and their segment container representations. Modifying functions will also adjust path suffixes if a delimiter to the existing path segments would be missing:
Code |
URL |
Path |
---|---|---|
// should not insert as "pathto/file.txt" urls::url u = urls::parse_uri_reference("to/file.txt").value(); urls::segments segs = u.segments(); segs.insert(segs.begin(), "path"); |
|
Relative path |
The path comes after the URL authority, including the initial slash /
:
Component |
Value |
---|---|
URL |
|
Path |
|
In this example, the path has three segments:
Component |
Value |
---|---|
URL |
|
Segment 1 |
|
Segment 2 |
|
Segment 3 |
(empty segment) |
Note that the final slash in /doc/libs/
implies an extra empty
segment that would not exist in the path /doc/libs
:
Component |
Value |
---|---|
URL |
|
Segment 1 |
|
Segment 2 |
|
A URL always contains a path, even if it is empty:
Component |
Value |
---|---|
URL |
|
Path |
Empty segments are also possible, resulting in consecutive slashes.
Component |
Value |
---|---|
URL |
|
Path |
|
Segment 1 |
(empty) |
Segment 2 |
|
Segment 3 |
(empty) |
Segment 4 |
(empty) |
Segment 5 |
|
If the authority is present, the path needs to be empty or start with a slash
/
.
Component |
Value |
---|---|
URL |
|
Host |
|
Path |
|
Segments |
0 |
Component |
Value |
---|---|
URL |
|
Host |
|
Path |
/ |
Segments |
0 |
Component |
Value |
---|---|
URL |
|
Host |
|
Path |
// |
Segments |
2 |
A path might begin with two slashes to indicate its first segment is empty.
Component |
Value |
---|---|
URL |
|
Authority |
|
Path |
|
Segment 1 |
(empty) |
Segment 2 |
|
Segment 3 |
|
Segment 4 |
(empty) |
However, beginning the path with double slashes is not possible when the authority is absent, as the first segment path would be interpreted as the authority.
Component |
Value |
---|---|
URL |
|
Authority |
|
Path |
|
Segment 1 |
|
Segment 2 |
(empty) |
For this reason, paths beginning with two slashes are typically avoided altogether.
Of the reserved character set for URLs, ":"
and "@"
may appear unencoded within paths.
Component |
Value |
---|---|
URL |
|
Authority |
|
Path |
|
Segment 1 |
|
Segment 2 |
|