Path

Notation

The path contains data, usually organized hierarchically, which is combined with the query to identify a resource within the scope of the scheme and authority.

Most schemes interpret the path as a sequence of slash delimited segments. These segments can map to file system paths, which is useful for file servers, but do not always need to imply this relationship.

In addition to interacting with the path as a single string, the library provides container adaptors modeling ranges of individual path segments.

The URL below contains a path /path/to/file.txt with the three segments path, to, and file.txt:

http://www.example.com/path/to/file.txt

Depending on the type of URL, there are various syntactic rules for how the path may be formulated in a URL. The BNF for these formulations is defined:

Table 1.16. Path BNF

path          = path-abempty    ; begins with "/" or is empty
              / path-absolute   ; begins with "/" but not "//"
              / path-noscheme   ; begins with a non-colon segment
              / path-rootless   ; begins with a segment
              / path-empty      ; zero characters

path-abempty  = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty    = 0<pchar>

Member Functions

The functions for interacting with the path in a url_view are as follows:

Table 1.17. Path Observers

Function	Description
`encoded_path`	Return the path as a percent-encoded string
`encoded_segments`	Return the path segments as a read-only container of percent-encoded strings.
`segments`	Return the path segments as a read-only container of strings with percent-decoding applied.

A URL path is usually interpreted as segments. The library provides two read-only containers for interacting with the segments in a URL's path:

Table 1.18. Segment View Types

Type	Description
`segments_encoded_view`	A read-only forward range of path segments returned as percent-encoded strings.
`segments_view`	A read-only forward range of path segments returned as strings with percent-decoding applied.

These views can be directly created by the parsing functions below. This provides the guarantee that all constructed views contain valid path segments:

Table 1.19. Path Parsing Functions

Function	Grammar
`parse_path`	any path
`parse_path_abempty`	path-abempty
`parse_path_absolute`	path-absolute
`parse_path_noscheme`	path-noscheme
`parse_path_rootless`	path-rootless

Observers

The function encoded_path can be used to obtain the path from a url_view:

Code	Output
urls::string_view s = "https://www.boost.org/doc/libs/"; urls::url_view u = urls::parse_uri(s).value(); std::cout << u << "\n" << "path: " << u.encoded_path() << "\n" << "encoded segments: " << u.encoded_segments() << "\n" << "segments: " << u.segments() << "\n";	https://www.boost.org/doc/libs/ path: /doc/libs/ encoded segments: /doc/libs/ segments: /doc/libs/

Code

Output

urls::string_view s = "https://www.boost.org/doc/libs/";
urls::url_view    u = urls::parse_uri(s).value();
std::cout << u << "\n"
    << "path:             " << u.encoded_path()     << "\n"
    << "encoded segments: " << u.encoded_segments() << "\n"
    << "segments:         " << u.segments()         << "\n";

https://www.boost.org/doc/libs/
path:             /doc/libs/
encoded segments: /doc/libs/
segments:         /doc/libs/

These functions do not throw. There is no function analogous to has_path because all URLs have valid paths, even when the path is empty.

Code	Output
urls::string_view s = "https://www.boost.org"; urls::url_view u = urls::parse_uri(s).value(); std::cout << u << "\n" << "path: " << u.encoded_path() << "\n" << "encoded segments: " << u.encoded_segments() << "\n" << "segments: " << u.segments() << "\n";	https://www.boost.org path: encoded segments: segments:

Code

Output

urls::string_view s = "https://www.boost.org";
urls::url_view    u = urls::parse_uri(s).value();
std::cout << u << "\n"
    << "path:             " << u.encoded_path()     << "\n"
    << "encoded segments: " << u.encoded_segments() << "\n"
    << "segments:         " << u.segments()         << "\n";

https://www.boost.org
path:
encoded segments:
segments:

Notice that there is also no decoded counterpart for encoded_path. The reason is any decoded character / could form an ambiguous path segment.

Segments View

These containers are lightweight references to the underlying path string. Ownership of the string is not transferred; the caller is responsible for ensuring that the lifetime of the string extends until the container is destroyed.

Code	Output
urls::string_view s = "https://www.boost.org/doc/libs"; urls::url_view u = urls::parse_uri(s).value(); std::cout << u.encoded_segments().size() << " segments\n"; for (auto seg: u.encoded_segments()) { std::cout << "segment: " << seg << "\n"; }	2 segments segment: doc segment: libs

Code

Output

urls::string_view s = "https://www.boost.org/doc/libs";
urls::url_view    u = urls::parse_uri(s).value();
std::cout << u.encoded_segments().size() << " segments\n";
for (auto seg: u.encoded_segments())
{
    std::cout << "segment: " << seg << "\n";
}

2 segments
segment: doc
segment: libs

In contexts where a path can appear by itself, such as HTTP requests, segment views may not be constructed directly from strings. Instead, we can use the analogous function parse_path to obtain a segments_encoded_view or segments_view.

Code	Output
urls::string_view s = "/doc/libs"; urls::segments_encoded_view p = urls::parse_path(s).value(); std::cout << "path: " << p << "\n"; std::cout << p.size() << " segments\n"; for (auto seg: p) { std::cout << "segment: " << seg << "\n"; }	path: /doc/libs 2 segments segment: doc segment: libs

Code

Output

urls::string_view s = "/doc/libs";
urls::segments_encoded_view p = urls::parse_path(s).value();
std::cout << "path: " << p << "\n";
std::cout << p.size() << " segments\n";
for (auto seg: p)
{
    std::cout << "segment: " << seg << "\n";
}

path: /doc/libs
2 segments
segment: doc
segment: libs

Path Semantics

A path can be absolute or relative. An absolute path begins with /:

URL	Path Type
urls::url_view u = urls::parse_uri("https://www.boost.org").value();	Relative path `""` with 0 segments
urls::url_view u = urls::parse_uri("https://www.boost.org/").value();	Absolute path `"/"` with 0 segments

The complete path segments "." and ".." are intended only for use within relative references (rfc3986 sec. 4.1) and are removed as part of the reference resolution process (rfc3986 sec. 5.2). Normalizing a URI resolves these dot-segments (rfc3986 sec. 5.2.4).

URL	Normalized URL	Path
urls::url u = urls::parse_uri("https://www.boost.org/./a/../b").value(); u.normalize();	`"https://www.boost.org/b"`	Absolute path `"/b"` with segments `{"b"}`

These rules imply a path with the prefix ":" or "/" could be in conflict with the scheme and authority components of the URL, since they end with these characters. For instance, attempting to create a path with the prefix //, i.e. a path whose first segment is empty, could be interpreted as an empty authority:

URL	Authority	Path
// scheme and a relative path urls::url_view u = urls::parse_uri("https:path/to/file.txt").value();	(no authority)	Relative path `"path/to/file.txt"` with segments `{"path", "to", "file.txt"}`
// scheme and an absolute path urls::url_view u = urls::parse_uri("https:/path/to/file.txt").value();	(no authority)	Absolute path `"/path/to/file.txt"` with segments `{"path", "to", "file.txt"}`
// "//path" will be considered the authority component urls::url_view u = urls::parse_uri("https://path/to/file.txt").value();	`"path"`	Absolute path `"/to/file.txt"` with segments `{"to", "file.txt"}`

URL

Authority

Path

// scheme and a relative path
urls::url_view u = urls::parse_uri("https:path/to/file.txt").value();

(no authority)

Relative path "path/to/file.txt" with segments {"path", "to", "file.txt"}

// scheme and an absolute path
urls::url_view u = urls::parse_uri("https:/path/to/file.txt").value();

(no authority)

Absolute path "/path/to/file.txt" with segments {"path", "to", "file.txt"}

// "//path" will be considered the authority component
urls::url_view u = urls::parse_uri("https://path/to/file.txt").value();

"path"

Absolute path "/to/file.txt" with segments {"to", "file.txt"}

Likewise, attempting to create a relative path whose first segment contains a ":" could be interpreted as another scheme and a path:

URL	Scheme	Path
// only a relative path urls::url_view u = urls::parse_uri_reference("path-to/file.txt").value();	(no scheme)	Relative path `"path-to/file.txt"` with segments `{"path-to", "file.txt"}`
// "path:" will be considered the scheme component // instead of a substring of the first segment urls::url_view u = urls::parse_uri_reference("path:to/file.txt").value();	`"path"`	Relative path `"to/file.txt"` with segments `{"to", "file.txt"}`

URL

Scheme

Path

// only a relative path
urls::url_view u = urls::parse_uri_reference("path-to/file.txt").value();

(no scheme)

Relative path "path-to/file.txt" with segments {"path-to", "file.txt"}

// "path:" will be considered the scheme component
// instead of a substring of the first segment
urls::url_view u = urls::parse_uri_reference("path:to/file.txt").value();

"path"

Relative path "to/file.txt" with segments {"to", "file.txt"}

Modifying functions will properly adjust paths with malleable null prefixes so that paths maintain their semantics without conflicting with the scheme or authority components:

Code	URL	Path
// "path" should not become the authority component urls::url u = urls::parse_uri("https:path/to/file.txt").value(); u.set_encoded_path("//path/to/file.txt");	`"https:/.//path/to/file.txt"`	Absolute path `"/.//path/to/file.txt"` with segments `{"", "path", "to", "file.txt"}`
// "path:to" should not make the scheme become "path:" urls::url u = urls::parse_uri_reference("path-to/file.txt").value(); u.set_encoded_path("path:to/file.txt");	`"./path:to/file.txt"`	Relative path `"./path:to/file.txt"` with segments `{"path:to", "file.txt"}`

Code

URL

Path

// "path" should not become the authority component
urls::url u = urls::parse_uri("https:path/to/file.txt").value();
u.set_encoded_path("//path/to/file.txt");

"https:/.//path/to/file.txt"

Absolute path "/.//path/to/file.txt" with segments {"", "path", "to", "file.txt"}

// "path:to" should not make the scheme become "path:"
urls::url u = urls::parse_uri_reference("path-to/file.txt").value();
u.set_encoded_path("path:to/file.txt");

"./path:to/file.txt"

Relative path "./path:to/file.txt" with segments {"path:to", "file.txt"}

Given relative or absolute path, note that all algorithms preserve the path semantics in lossless round-trip conversions between the URL path and their segment container representations. Modifying functions will also adjust path suffixes if a delimiter to the existing path segments would be missing:

Code	URL	Path
// should not insert as "pathto/file.txt" urls::url u = urls::parse_uri_reference("to/file.txt").value(); urls::segments segs = u.segments(); segs.insert(segs.begin(), "path");	`"path/to/file.txt"`	Relative path `"path/to/file.txt"` with segments `{"path", "to", "file.txt"}`

Use Cases

The path comes after the URL authority, including the initial slash /:

Component	Value
URL	`https://www.boost.org/doc/libs/`
Path	`/doc/libs/`

In this example, the path has three segments:

Component	Value
URL	`https://www.boost.org/doc/libs/`
Segment 1	`doc`
Segment 2	`libs`
Segment 3	(empty segment)

Note that the final slash in /doc/libs/ implies an extra empty segment that would not exist in the path /doc/libs:

Component	Value
URL	`https://www.boost.org/doc/libs`
Segment 1	`doc`
Segment 2	`libs`

A URL always contains a path, even if it is empty:

Component	Value
URL	`https://www.boost.org`
Path

Empty segments are also possible, resulting in consecutive slashes.

Component	Value
URL	`https://www.boost.org//doc///libs`
Path	`//doc///libs`
Segment 1	(empty)
Segment 2	`doc`
Segment 3	(empty)
Segment 4	(empty)
Segment 5	`libs`

If the authority is present, the path needs to be empty or start with a slash /.

Component	Value
URL	`https://www.boost.org`
Host	`www.boost.org`
Path
Segments	0

Component	Value
URL	`https://www.boost.org/`
Host	`www.boost.org`
Path	/
Segments	0

Component	Value
URL	`https://www.boost.org//`
Host	`www.boost.org`
Path	//
Segments	2

A path might begin with two slashes to indicate its first segment is empty.

Component	Value
URL	`https://www.boost.org//doc/libs/`
Authority	`www.boost.org`
Path	`//doc/libs/`
Segment 1	(empty)
Segment 2	`doc`
Segment 3	`libs`
Segment 4	(empty)

However, beginning the path with double slashes is not possible when the authority is absent, as the first segment path would be interpreted as the authority.

Component	Value
URL	`https://doc/libs/`
Authority	`doc`
Path	`/libs/`
Segment 1	`libs`
Segment 2	(empty)

For this reason, paths beginning with two slashes are typically avoided altogether.

Of the reserved character set for URLs, ":" and "@" may appear unencoded within paths.

Component	Value
URL	`https://www.boost.org/doc@folder/libs:boost`
Authority	`www.boost.org`
Path	`/doc@folder/libs:boost`
Segment 1	`doc@folder`
Segment 2	`libs:boost`