Boost.URL Logo

PrevUpHomeNext

Path

Notation

The path contains data, usually organized hierarchically, which is combined with the query to identify a resource within the scope of the scheme and authority.

Most schemes interpret the path as a sequence of slash delimited segments. These segments can map to file system paths, which is useful for file servers, but do not always need to imply this relationship.

In addition to interacting with the path as a single string, the library provides container adaptors modeling ranges of individual path segments.

The URL below contains a path /path/to/file.txt with the three segments path, to, and file.txt:

http://www.example.com/path/to/file.txt

Depending on the type of URL, there are various syntactic rules for how the path may be formulated in a URL. The BNF for these formulations is defined:

Table 1.20. Path BNF

path          = path-abempty    ; begins with "/" or is empty
              / path-absolute   ; begins with "/" but not "//"
              / path-noscheme   ; begins with a non-colon segment
              / path-rootless   ; begins with a segment
              / path-empty      ; zero characters

path-abempty  = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty    = 0<pchar>

Observers

The functions path and encoded_path can be used to obtain the path from a url_view:

Code

Output

string_view s = "https://www.boost.org/doc/libs/";
url_view    u = parse_uri(s).value();
std::cout << u << "\n"
    << "path:             " << u.path()             << "\n"
    << "path:             " << u.encoded_path()     << "\n"
    << "segments:         " << u.segments()         << "\n"
    << "encoded_segments: " << u.encoded_segments() << "\n";
https://www.boost.org/doc/libs/
path:             /doc/libs/
encoded_path:     /doc/libs/
segments:         /doc/libs/
encoded segments: /doc/libs/

These functions do not throw. There is no function analogous to has_path because all URLs have valid paths, even when the path is empty.

Code

Output

string_view s = "https://www.boost.org";
url_view    u = parse_uri(s).value();
std::cout << u << "\n"
    << "path:             " << u.encoded_path()     << "\n";
https://www.boost.org
path:

Notice that there the decoded counterpart for encoded_path should be used with care when the path segments represents a hierarchy as any decoded character / could form an ambiguous path segment. In these use cases, segment views are the most appropriate to access individual decoded path segments.

Segments View

These segment view containers are lightweight references to the underlying path string. Ownership of the string is not transferred; the caller is responsible for ensuring that the lifetime of the string extends until the container is destroyed.

Code

Output

string_view s = "https://www.boost.org/doc/libs";
url_view u = parse_uri(s).value();
std::cout << u.segments().size() << " segments\n";
for (auto seg: u.segments())
{
    std::cout << "segment: " << seg << "\n";
}
2 segments
segment: doc
segment: libs

In contexts where a path can appear by itself, such as HTTP requests, segment views may not be constructed directly from strings. Instead, we can use the analogous function parse_path to obtain a segments_view or a segments_encoded_view.

Code

Output

string_view s = "/doc/libs";
segments_view p = parse_path(s).value();
std::cout << "path: " << p << "\n";
std::cout << p.size() << " segments\n";
for (auto seg: p)
{
    std::cout << "segment: " << seg << "\n";
}
path: /doc/libs
2 segments
segment: doc
segment: libs
Path Semantics

A path can be absolute or relative. An absolute path begins with /:

URL

Path Type

url_view u = parse_uri("https://www.boost.org").value();

Relative path "" with 0 segments

url_view u = parse_uri("https://www.boost.org/").value();

Absolute path "/" with 0 segments

The complete path segments "." and ".." are intended only for use within relative references (rfc3986 sec. 4.1) and are removed as part of the reference resolution process (rfc3986 sec. 5.2). Normalizing a URI resolves these dot-segments (rfc3986 sec. 5.2.4).

URL

Normalized URL

Path

url u = parse_uri("https://www.boost.org/./a/../b").value();
u.normalize();

"https://www.boost.org/b"

Absolute path "/b" with segments {"b"}

These rules imply a path with the prefix ":" or "/" could be in conflict with the scheme and authority components of the URL, since they end with these characters. For instance, attempting to create a path with the prefix //, i.e. a path whose first segment is empty, could be interpreted as an empty authority:

URL

Authority

Path

// scheme and a relative path
url_view u = parse_uri("https:path/to/file.txt").value();

(no authority)

Relative path "path/to/file.txt" with segments {"path", "to", "file.txt"}

// scheme and an absolute path
url_view u = parse_uri("https:/path/to/file.txt").value();

(no authority)

Absolute path "/path/to/file.txt" with segments {"path", "to", "file.txt"}

// "//path" will be considered the authority component
url_view u = parse_uri("https://path/to/file.txt").value();

"path"

Absolute path "/to/file.txt" with segments {"to", "file.txt"}

Likewise, attempting to create a relative path whose first segment contains a ":" could be interpreted as another scheme and a path:

URL

Scheme

Path

// only a relative path
url_view u = parse_uri_reference("path-to/file.txt").value();

(no scheme)

Relative path "path-to/file.txt" with segments {"path-to", "file.txt"}

// "path:" will be considered the scheme component
// instead of a substring of the first segment
url_view u = parse_uri_reference("path:to/file.txt").value();

"path"

Relative path "to/file.txt" with segments {"to", "file.txt"}

Modifying functions will properly adjust paths with malleable null prefixes so that paths maintain their semantics without conflicting with the scheme or authority components:

Code

URL

Path

// "path" should not become the authority component
url u = parse_uri("https:path/to/file.txt").value();
u.set_encoded_path("//path/to/file.txt");

"https:/.//path/to/file.txt"

Absolute path "/.//path/to/file.txt" with segments {"", "path", "to", "file.txt"}

// "path:to" should not make the scheme become "path:"
url u = parse_uri_reference("path-to/file.txt").value();
u.set_encoded_path("path:to/file.txt");

"./path:to/file.txt"

Relative path "./path:to/file.txt" with segments {"path:to", "file.txt"}

Given relative or absolute path, note that all algorithms preserve the path semantics in lossless round-trip conversions between the URL path and their segment container representations. Modifying functions will also adjust path suffixes if a delimiter to the existing path segments would be missing:

Code

URL

Path

// should not insert as "pathto/file.txt"
url u = parse_uri_reference("to/file.txt").value();
segments segs = u.segments();
segs.insert(segs.begin(), "path");

"path/to/file.txt"

Relative path "path/to/file.txt" with segments {"path", "to", "file.txt"}

Use Cases

The path comes after the URL authority, including the initial slash /:

Component

Value

URL

https://www.boost.org/doc/libs/

Path

/doc/libs/

In this example, the path has three segments:

Component

Value

URL

https://www.boost.org/doc/libs/

Segment 1

doc

Segment 2

libs

Segment 3

(empty segment)

Note that the final slash in /doc/libs/ implies an extra empty segment that would not exist in the path /doc/libs:

Component

Value

URL

https://www.boost.org/doc/libs

Segment 1

doc

Segment 2

libs

A URL always contains a path, even if it is empty:

Component

Value

URL

https://www.boost.org

Path

Empty segments are also possible, resulting in consecutive slashes.

Component

Value

URL

https://www.boost.org//doc///libs

Path

//doc///libs

Segment 1

(empty)

Segment 2

doc

Segment 3

(empty)

Segment 4

(empty)

Segment 5

libs

If the authority is present, the path needs to be empty or start with a slash /.

Component

Value

URL

https://www.boost.org

Host

www.boost.org

Path

Segments

0

Component

Value

URL

https://www.boost.org/

Host

www.boost.org

Path

/

Segments

0

Component

Value

URL

https://www.boost.org//

Host

www.boost.org

Path

//

Segments

2

A path might begin with two slashes to indicate its first segment is empty.

Component

Value

URL

https://www.boost.org//doc/libs/

Authority

www.boost.org

Path

//doc/libs/

Segment 1

(empty)

Segment 2

doc

Segment 3

libs

Segment 4

(empty)

However, beginning the path with double slashes is not possible when the authority is absent, as the first segment path would be interpreted as the authority.

Component

Value

URL

https://doc/libs/

Authority

doc

Path

/libs/

Segment 1

libs

Segment 2

(empty)

For this reason, paths beginning with two slashes are typically avoided altogether.

Of the reserved character set for URLs, ":" and "@" may appear unencoded within paths.

Component

Value

URL

https://www.boost.org/doc@folder/libs:boost

Authority

www.boost.org

Path

/doc@folder/libs:boost

Segment 1

doc@folder

Segment 2

libs:boost

Member Functions

The functions for interacting with the path in a url_view are as follows:

Table 1.21. Path Observers

Function

Description

path

Return the path as a percent-decoded string

encoded_path

Return the path as a percent-encoded string

encoded_segments

Return the path segments as a read-only container of percent-encoded strings.


A URL path is usually interpreted as segments. The library provides two read-only containers for interacting with the segments in a URL's path:

Table 1.22. Segment Observers

Type

Description

segments_view

A read-only forward range of path segments returned as strings with percent-decoding applied.

segments_encoded_view

A read-only forward range of path segments returned as percent-encoded strings.


Segment views can be directly created by the parsing functions below. This provides the guarantee that all constructed views contain valid path segments:

Table 1.23. Path Parsing Functions

Function

Grammar

parse_path

any path


The functions for modifying paths in a url are as follows:

Table 1.24. Path Modifiers

Function

Description

set_path

Set the path

set_encoded_path

Set the encoded path

set_path_absolute

Set whether the path is absolute

normalize_path

Normalize path


A URL path is usually interpreted as segments. A url provides two modifiable containers for interacting with the segments in a URL's path:

Table 1.25. Segment View Types

Type

Description

segments

A forward range of path segments returned as strings with percent-decoding applied.

segments_encoded

A forward range of path segments returned as percent-encoded strings.



PrevUpHomeNext