Rust's url crate provides a type-safe, standards-compliant URL parser that follows the WHATWG URL Standard. It's the de facto choice for URL handling in Rust applications.
Key Takeaways
- 1The url crate follows WHATWG URL Standard
- 2Parsing returns Result for safe error handling
- 3URLs are immutable by default, use setters for modification
- 4Query pairs can be iterated and modified type-safely
- 5Supports relative URL resolution against base URLs
“The url crate provides a URL parser that implements the URL Standard. URLs are parsed into a Url struct. The Url struct provides methods to read the URL components and to modify the URL.”
Setup
The url crate isn't part of Rust's standard library, so you'll need to add it as a dependency. This is standard practice in Rust, where the standard library is deliberately minimal.
# Cargo.toml
[dependencies]
url = "2"Version 2 of the crate is the current stable release. The crate follows semantic versioning, so any 2.x version will be compatible.
Parsing URLs
Rust's approach to URL parsing emphasizes safety through the type system. The parse() method returns a Result, forcing you to handle potential parsing errors explicitly. This is different from JavaScript where you'd need try-catch.
use url::Url;
fn main() -> Result<(), url::ParseError> {
let url = Url::parse("https://user:pass@example.com:8080/path?query=value#fragment")?;
println!("Scheme: {}", url.scheme()); // "https"
println!("Host: {:?}", url.host_str()); // Some("example.com")
println!("Port: {:?}", url.port()); // Some(8080)
println!("Path: {}", url.path()); // "/path"
println!("Query: {:?}", url.query()); // Some("query=value")
println!("Fragment: {:?}", url.fragment()); // Some("fragment")
println!("Username: {}", url.username()); // "user"
println!("Password: {:?}", url.password()); // Some("pass")
Ok(())
}The code uses Rust's ? operator to propagate parsing errors. Notice how methods like host_str() and query() return Option types because these components might not be present. This forces you to handle the "missing component" case explicitly.
URL Components
The Url struct provides methods for accessing all URL components. Methods returning Option indicate components that might be absent.
| Method | Return Type | Description |
|---|---|---|
| scheme() | str | Protocol (http, https) |
| host_str() | Option<&str> | Domain or IP |
| host() | Option<Host<&str>> | Parsed host (domain, IPv4, IPv6) |
| port() | Option<u16> | Port number |
| port_or_known_default() | Option<u16> | Port or default for scheme |
| path() | str | URL path |
| path_segments() | Option<Split> | Path as iterator |
| query() | Option<&str> | Raw query string |
| query_pairs() | Parse | Query as key-value pairs |
| fragment() | Option<&str> | Fragment/hash |
The distinction between port() and port_or_known_default() is useful: the former returns None if no port is specified, while the latter returns 80 for HTTP or 443 for HTTPS when the port is omitted.
Working with Query Parameters
Query parameters are accessed through the query_pairs() method, which returns an iterator over key-value pairs. This lazy approach is efficient for large query strings.
use url::Url;
fn main() -> Result<(), url::ParseError> {
let url = Url::parse("https://api.com/search?q=rust+programming&page=1&limit=20")?;
// Iterate over query parameters
for (key, value) in url.query_pairs() {
println!("{} = {}", key, value);
}
// Output:
// q = rust programming
// page = 1
// limit = 20
// Collect into HashMap
use std::collections::HashMap;
let params: HashMap<_, _> = url.query_pairs().into_owned().collect();
if let Some(query) = params.get("q") {
println!("Search query: {}", query); // "rust programming"
}
Ok(())
}The iterator yields Cow<str> (copy-on-write strings) for efficiency. When collecting into a HashMap, use into_owned() to convert to owned strings. Note that duplicate keys will overwrite earlier values in a HashMap.
With parsing covered, let's look at how to construct and modify URLs.
Building URLs
URLs in Rust are mutable through setter methods. You can modify any component after parsing, and the URL struct ensures the result remains valid.
use url::Url;
fn main() -> Result<(), url::ParseError> {
// Start with a base URL and modify it
let mut url = Url::parse("https://api.example.com")?;
url.set_path("/v2/users");
url.set_query(Some("page=1&limit=10"));
url.set_fragment(Some("results"));
println!("{}", url);
// https://api.example.com/v2/users?page=1&limit=10#results
// Build query string programmatically
let mut url = Url::parse("https://api.example.com/search")?;
url.query_pairs_mut()
.append_pair("q", "rust tutorial")
.append_pair("page", "1")
.append_pair("sort", "relevance");
println!("{}", url);
// https://api.example.com/search?q=rust+tutorial&page=1&sort=relevance
Ok(())
}The query_pairs_mut() method returns a serializer that lets you build query strings using a fluent API. The append_pair() method handles encoding automatically. Special characters in values are percent-encoded.
Modifying Query Parameters
Modifying existing query parameters requires clearing and rebuilding, since there's no direct "update" method. The fluent API makes this concise.
use url::Url;
fn main() -> Result<(), url::ParseError> {
let mut url = Url::parse("https://shop.com/products?category=books&page=1")?;
// Clear existing query and set new parameters
url.query_pairs_mut()
.clear()
.append_pair("category", "electronics")
.append_pair("page", "2")
.append_pair("sort", "price");
println!("{}", url);
// https://shop.com/products?category=electronics&page=2&sort=price
// Add parameters while keeping existing ones
let mut url = Url::parse("https://api.com/data?existing=value")?;
url.query_pairs_mut()
.append_pair("new", "param");
println!("{}", url);
// https://api.com/data?existing=value&new=param
Ok(())
}Use clear() when you want to replace all parameters, or omit it to append to existing ones. The serializer is dropped when it goes out of scope, at which point it writes the encoded query string back to the URL.
Resolving Relative URLs
Resolving relative URLs is common when crawling websites or processing HTML. The join() method follows RFC 3986 rules for resolution.
use url::Url;
fn main() -> Result<(), url::ParseError> {
let base = Url::parse("https://example.com/docs/guide/")?;
// Resolve relative URLs against base
let relative = base.join("../api/reference")?;
println!("{}", relative); // https://example.com/docs/api/reference
let absolute = base.join("/about")?;
println!("{}", absolute); // https://example.com/about
let with_query = base.join("page?id=123")?;
println!("{}", with_query); // https://example.com/docs/guide/page?id=123
// Parse relative URLs
let options = Url::options().base_url(Some(&base));
let resolved = options.parse("../images/logo.png")?;
println!("{}", resolved); // https://example.com/docs/images/logo.png
Ok(())
}The join() method handles all relative URL patterns: parent directories (..), same-directory, root-relative (/path), and protocol-relative. For more control, use Url::options() with a base URL.
URL Encoding
The url crate handles most encoding automatically, but you can also work with encoding directly using the form_urlencoded module.
use url::Url;
use url::form_urlencoded;
fn main() -> Result<(), url::ParseError> {
// The url crate handles encoding automatically
let mut url = Url::parse("https://api.com/search")?;
url.query_pairs_mut()
.append_pair("q", "hello world & friends")
.append_pair("special", "100% safe <script>");
println!("{}", url);
// https://api.com/search?q=hello+world+%26+friends&special=100%25+safe+%3Cscript%3E
// Manual encoding/decoding
let encoded: String = form_urlencoded::Serializer::new(String::new())
.append_pair("name", "John Doe")
.append_pair("email", "john@example.com")
.finish();
println!("{}", encoded); // name=John+Doe&email=john%40example.com
// Decode
let decoded: Vec<(String, String)> = form_urlencoded::parse(encoded.as_bytes())
.into_owned()
.collect();
for (key, value) in decoded {
println!("{}: {}", key, value);
}
Ok(())
}The Serializer type builds encoded query strings, while parse() decodes them. Notice how special characters like & and @ are automatically percent-encoded. The crate follows the application/x-www-form-urlencoded format where spaces become +.
URL Validation
URL validation in Rust benefits from the type system. You can encode security requirements in function signatures and use pattern matching for clean validation logic.
use url::Url;
fn is_valid_http_url(input: &str) -> bool {
match Url::parse(input) {
Ok(url) => {
// Must be http or https
matches!(url.scheme(), "http" | "https")
// Must have a host
&& url.host_str().is_some()
// Host must not be localhost for external URLs
&& url.host_str() != Some("localhost")
}
Err(_) => false,
}
}
fn is_safe_redirect(url: &str, allowed_hosts: &[&str]) -> bool {
match Url::parse(url) {
Ok(parsed) => {
if let Some(host) = parsed.host_str() {
allowed_hosts.contains(&host)
} else {
false
}
}
Err(_) => false,
}
}
fn main() {
println!("{}", is_valid_http_url("https://example.com")); // true
println!("{}", is_valid_http_url("javascript:alert(1)")); // false
println!("{}", is_valid_http_url("file:///etc/passwd")); // false
let allowed = vec!["myapp.com", "api.myapp.com"];
println!("{}", is_safe_redirect("https://myapp.com/callback", &allowed)); // true
println!("{}", is_safe_redirect("https://evil.com/steal", &allowed)); // false
}The matches! macro provides a clean way to check if a value matches a pattern. These validation functions return early on any failure, following Rust's preference for explicit error handling.
Working with Host Types
The url crate distinguishes between domain names, IPv4 addresses, and IPv6 addresses at the type level. This lets you handle each case differently and use IP-specific validation methods.
use url::{Url, Host};
use std::net::{Ipv4Addr, Ipv6Addr};
fn main() -> Result<(), url::ParseError> {
let urls = vec![
"https://example.com/",
"https://192.168.1.1/",
"https://[::1]/",
];
for url_str in urls {
let url = Url::parse(url_str)?;
match url.host() {
Some(Host::Domain(domain)) => {
println!("Domain: {}", domain);
}
Some(Host::Ipv4(ip)) => {
println!("IPv4: {}", ip);
// Check for private IPs
if ip.is_private() {
println!(" (private address)");
}
}
Some(Host::Ipv6(ip)) => {
println!("IPv6: {}", ip);
if ip.is_loopback() {
println!(" (loopback address)");
}
}
None => println!("No host"),
}
}
Ok(())
}Pattern matching on the Host enum lets you handle each case appropriately. For IP addresses, you can use methods like is_private() and is_loopback() for security checks. This is particularly useful for SSRF prevention.
Practical Examples
Let's put these concepts together with some practical utility functions you can use in your Rust projects.
use url::Url;
use std::collections::HashMap;
// Pagination helper
fn paginate_url(base: &str, page: u32, per_page: u32) -> Result<String, url::ParseError> {
let mut url = Url::parse(base)?;
url.query_pairs_mut()
.append_pair("page", &page.to_string())
.append_pair("per_page", &per_page.to_string());
Ok(url.to_string())
}
// Add UTM parameters
fn add_utm_params(
base: &str,
source: &str,
medium: &str,
campaign: &str,
) -> Result<String, url::ParseError> {
let mut url = Url::parse(base)?;
url.query_pairs_mut()
.append_pair("utm_source", source)
.append_pair("utm_medium", medium)
.append_pair("utm_campaign", campaign);
Ok(url.to_string())
}
// Extract and validate OAuth callback
fn validate_oauth_callback(callback_url: &str) -> Option<HashMap<String, String>> {
let url = Url::parse(callback_url).ok()?;
let params: HashMap<String, String> = url.query_pairs()
.map(|(k, v)| (k.to_string(), v.to_string()))
.collect();
// Must have either code or error
if params.contains_key("code") || params.contains_key("error") {
Some(params)
} else {
None
}
}
fn main() -> Result<(), url::ParseError> {
println!("{}", paginate_url("https://api.com/users", 2, 20)?);
// https://api.com/users?page=2&per_page=20
println!("{}", add_utm_params(
"https://mysite.com/landing",
"twitter",
"social",
"launch2026"
)?);
// https://mysite.com/landing?utm_source=twitter&utm_medium=social&utm_campaign=launch2026
Ok(())
}These examples show idiomatic Rust patterns: returning Result types, using the ? operator for error propagation, and leveraging the type system for safety. The fluent API makes URL manipulation concise while maintaining clarity.