Boost.URL Logo

PrevUpHomeNext

Path

Notation

The path contains data, usually organized hierarchically, which is combined with the query to identify a resource within the scope of the scheme and authority.

Most schemes interpret the path as a sequence of slash delimited segments. These segments can map to file system paths, which is useful for file servers, but do not always need to imply this relationship.

In addition to interacting with the path as a single string, the library provides container adaptors modeling ranges of individual path segments.

The URL below contains a path /path/to/file.txt with the three segments path, to, and file.txt:

http://www.example.com/path/to/file.txt

Depending on the type of URL, there are various syntactic rules for how the path may be formulated in a URL. The BNF for these formulations is defined:

Table 1.16. Path BNF

path          = path-abempty    ; begins with "/" or is empty
              / path-absolute   ; begins with "/" but not "//"
              / path-noscheme   ; begins with a non-colon segment
              / path-rootless   ; begins with a segment
              / path-empty      ; zero characters

path-abempty  = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty    = 0<pchar>

Member Functions

The functions for interacting with the path in a url_view are as follows:

Table 1.17. Path Observers

Function

Description

encoded_path

Return the path as a percent-encoded string

encoded_segments

Return the path segments as a read-only container of percent-encoded strings.

segments

Return the path segments as a read-only container of strings with percent-decoding applied.


A URL path is usually interpreted as segments. The library provides two read-only containers for interacting with the segments in a URL's path:

Table 1.18. Segment View Types

Type

Description

segments_encoded_view

A read-only forward range of path segments returned as percent-encoded strings.

segments_view

A read-only forward range of path segments returned as strings with percent-decoding applied.


These views can be directly created by the parsing functions below. This provides the guarantee that all constructed views contain valid path segments:

Table 1.19. Path Parsing Functions


Observers

The function encoded_path can be used to obtain the path from a url_view:

Code

Output

urls::string_view s = "https://www.boost.org/doc/libs/";
urls::url_view    u = urls::parse_uri(s).value();
std::cout << u << "\n"
    << "path:             " << u.encoded_path()     << "\n"
    << "encoded segments: " << u.encoded_segments() << "\n"
    << "segments:         " << u.segments()         << "\n";
https://www.boost.org/doc/libs/
path:             /doc/libs/
encoded segments: /doc/libs/
segments:         /doc/libs/

These functions do not throw. There is no function analogous to has_path because all URLs have valid paths, even when the path is empty.

Code

Output

urls::string_view s = "https://www.boost.org";
urls::url_view    u = urls::parse_uri(s).value();
std::cout << u << "\n"
    << "path:             " << u.encoded_path()     << "\n"
    << "encoded segments: " << u.encoded_segments() << "\n"
    << "segments:         " << u.segments()         << "\n";
https://www.boost.org
path:
encoded segments:
segments:

Notice that there is also no decoded counterpart for encoded_path. The reason is any decoded character / could form an ambiguous path segment.

Segments View

These containers are lightweight references to the underlying path string. Ownership of the string is not transferred; the caller is responsible for ensuring that the lifetime of the string extends until the container is destroyed.

Code

Output

urls::string_view s = "https://www.boost.org/doc/libs";
urls::url_view    u = urls::parse_uri(s).value();
std::cout << u.encoded_segments().size() << " segments\n";
for (auto seg: u.encoded_segments())
{
    std::cout << "segment: " << seg << "\n";
}
2 segments
segment: doc
segment: libs

In contexts where a path can appear by itself, such as HTTP requests, segment views may not be constructed directly from strings. Instead, we can use the analogous function parse_path to obtain a segments_encoded_view or segments_view.

Code

Output

urls::string_view s = "/doc/libs";
urls::segments_encoded_view p = urls::parse_path(s).value();
std::cout << "path: " << p << "\n";
std::cout << p.size() << " segments\n";
for (auto seg: p)
{
    std::cout << "segment: " << seg << "\n";
}
path: /doc/libs
2 segments
segment: doc
segment: libs
Path Semantics

A path can be absolute or relative. An absolute path begins with /:

URL

Path Type

urls::url_view u = urls::parse_uri("https://www.boost.org").value();

Relative path "" with 0 segments

urls::url_view u = urls::parse_uri("https://www.boost.org/").value();

Absolute path "/" with 0 segments

The complete path segments "." and ".." are intended only for use within relative references (rfc3986 sec. 4.1) and are removed as part of the reference resolution process (rfc3986 sec. 5.2). Normalizing a URI resolves these dot-segments (rfc3986 sec. 5.2.4).

URL

Normalized URL

Path

urls::url u = urls::parse_uri("https://www.boost.org/./a/../b").value();
u.normalize();

"https://www.boost.org/b"

Absolute path "/b" with segments {"b"}

These rules imply a path with the prefix ":" or "/" could be in conflict with the scheme and authority components of the URL, since they end with these characters. For instance, attempting to create a path with the prefix //, i.e. a path whose first segment is empty, could be interpreted as an empty authority:

URL

Authority

Path

// scheme and a relative path
urls::url_view u = urls::parse_uri("https:path/to/file.txt").value();

(no authority)

Relative path "path/to/file.txt" with segments {"path", "to", "file.txt"}

// scheme and an absolute path
urls::url_view u = urls::parse_uri("https:/path/to/file.txt").value();

(no authority)

Absolute path "/path/to/file.txt" with segments {"path", "to", "file.txt"}

// "//path" will be considered the authority component
urls::url_view u = urls::parse_uri("https://path/to/file.txt").value();

"path"

Absolute path "/to/file.txt" with segments {"to", "file.txt"}

Likewise, attempting to create a relative path whose first segment contains a ":" could be interpreted as another scheme and a path:

URL

Scheme

Path

// only a relative path
urls::url_view u = urls::parse_uri_reference("path-to/file.txt").value();

(no scheme)

Relative path "path-to/file.txt" with segments {"path-to", "file.txt"}

// "path:" will be considered the scheme component
// instead of a substring of the first segment
urls::url_view u = urls::parse_uri_reference("path:to/file.txt").value();

"path"

Relative path "to/file.txt" with segments {"to", "file.txt"}

Modifying functions will properly adjust paths with malleable null prefixes so that paths maintain their semantics without conflicting with the scheme or authority components:

Code

URL

Path

// "path" should not become the authority component
urls::url u = urls::parse_uri("https:path/to/file.txt").value();
u.set_encoded_path("//path/to/file.txt");

"https:/.//path/to/file.txt"

Absolute path "/.//path/to/file.txt" with segments {"", "path", "to", "file.txt"}

// "path:to" should not make the scheme become "path:"
urls::url u = urls::parse_uri_reference("path-to/file.txt").value();
u.set_encoded_path("path:to/file.txt");

"./path:to/file.txt"

Relative path "./path:to/file.txt" with segments {"path:to", "file.txt"}

Given relative or absolute path, note that all algorithms preserve the path semantics in lossless round-trip conversions between the URL path and their segment container representations. Modifying functions will also adjust path suffixes if a delimiter to the existing path segments would be missing:

Code

URL

Path

// should not insert as "pathto/file.txt"
urls::url u = urls::parse_uri_reference("to/file.txt").value();
urls::segments segs = u.segments();
segs.insert(segs.begin(), "path");

"path/to/file.txt"

Relative path "path/to/file.txt" with segments {"path", "to", "file.txt"}

Use Cases

The path comes after the URL authority, including the initial slash /:

Component

Value

URL

https://www.boost.org/doc/libs/

Path

/doc/libs/

In this example, the path has three segments:

Component

Value

URL

https://www.boost.org/doc/libs/

Segment 1

doc

Segment 2

libs

Segment 3

(empty segment)

Note that the final slash in /doc/libs/ implies an extra empty segment that would not exist in the path /doc/libs:

Component

Value

URL

https://www.boost.org/doc/libs

Segment 1

doc

Segment 2

libs

A URL always contains a path, even if it is empty:

Component

Value

URL

https://www.boost.org

Path

Empty segments are also possible, resulting in consecutive slashes.

Component

Value

URL

https://www.boost.org//doc///libs

Path

//doc///libs

Segment 1

(empty)

Segment 2

doc

Segment 3

(empty)

Segment 4

(empty)

Segment 5

libs

If the authority is present, the path needs to be empty or start with a slash /.

Component

Value

URL

https://www.boost.org

Host

www.boost.org

Path

Segments

0

Component

Value

URL

https://www.boost.org/

Host

www.boost.org

Path

/

Segments

0

Component

Value

URL

https://www.boost.org//

Host

www.boost.org

Path

//

Segments

2

A path might begin with two slashes to indicate its first segment is empty.

Component

Value

URL

https://www.boost.org//doc/libs/

Authority

www.boost.org

Path

//doc/libs/

Segment 1

(empty)

Segment 2

doc

Segment 3

libs

Segment 4

(empty)

However, beginning the path with double slashes is not possible when the authority is absent, as the first segment path would be interpreted as the authority.

Component

Value

URL

https://doc/libs/

Authority

doc

Path

/libs/

Segment 1

libs

Segment 2

(empty)

For this reason, paths beginning with two slashes are typically avoided altogether.

Of the reserved character set for URLs, ":" and "@" may appear unencoded within paths.

Component

Value

URL

https://www.boost.org/doc@folder/libs:boost

Authority

www.boost.org

Path

/doc@folder/libs:boost

Segment 1

doc@folder

Segment 2

libs:boost


PrevUpHomeNext