Why use the url crate instead of string manipulation?

The url crate provides proper encoding, validation, and parsing according to the WHATWG URL Standard. String manipulation is error-prone with edge cases like encoding, internationalized domain names, and IPv6 addresses.

How do I handle URLs that might be relative?

Use Url::options().base_url(Some(&base)).parse(input) to resolve potentially relative URLs against a base URL. This handles all cases including absolute URLs, protocol-relative URLs, and path-relative URLs.

Does the url crate support data: and blob: URLs?

Yes, the url crate can parse data: and blob: URLs, though they have different semantics. Use url.scheme() to check the scheme before processing.

Parse URLs in Rust - url Crate Complete Guide

Rust's url crate provides a type-safe, standards-compliant URL parser that follows the WHATWG URL Standard. It's the de facto choice for URL handling in Rust applications.

Key Takeaways

1The url crate follows WHATWG URL Standard
2Parsing returns Result for safe error handling
3URLs are immutable by default, use setters for modification
4Query pairs can be iterated and modified type-safely
5Supports relative URL resolution against base URLs

Definition

Rust URL Parsing

Rust URL parsing uses the url crate to parse, manipulate, and serialize URLs according to the WHATWG URL Standard. The crate provides type-safe URL handling with proper encoding, validation, and component access through a strongly-typed API.Source: url crate documentation

“The url crate provides a URL parser that implements the URL Standard. URLs are parsed into a Url struct. The Url struct provides methods to read the URL components and to modify the URL.”
— url crate documentation

Setup

The url crate isn't part of Rust's standard library, so you'll need to add it as a dependency. This is standard practice in Rust, where the standard library is deliberately minimal.

toml

# Cargo.toml
[dependencies]
url = "2"

Version 2 of the crate is the current stable release. The crate follows semantic versioning, so any 2.x version will be compatible.

Parsing URLs

Rust's approach to URL parsing emphasizes safety through the type system. The parse() method returns a Result, forcing you to handle potential parsing errors explicitly. This is different from JavaScript where you'd need try-catch.

rust

use url::Url;

fn main() -> Result<(), url::ParseError> {
    let url = Url::parse("https://user:pass@example.com:8080/path?query=value#fragment")?;

    println!("Scheme: {}", url.scheme());           // "https"
    println!("Host: {:?}", url.host_str());         // Some("example.com")
    println!("Port: {:?}", url.port());             // Some(8080)
    println!("Path: {}", url.path());               // "/path"
    println!("Query: {:?}", url.query());           // Some("query=value")
    println!("Fragment: {:?}", url.fragment());     // Some("fragment")
    println!("Username: {}", url.username());       // "user"
    println!("Password: {:?}", url.password());     // Some("pass")

    Ok(())
}

The code uses Rust's ? operator to propagate parsing errors. Notice how methods like host_str() and query() return Option types because these components might not be present. This forces you to handle the "missing component" case explicitly.

URL Components

The Url struct provides methods for accessing all URL components. Methods returning Option indicate components that might be absent.

Method	Return Type	Description
scheme()	str	Protocol (http, https)
host_str()	Option<&str>	Domain or IP
host()	Option<Host<&str>>	Parsed host (domain, IPv4, IPv6)
port()	Option<u16>	Port number
port_or_known_default()	Option<u16>	Port or default for scheme
path()	str	URL path
path_segments()	Option<Split>	Path as iterator
query()	Option<&str>	Raw query string
query_pairs()	Parse	Query as key-value pairs
fragment()	Option<&str>	Fragment/hash

The distinction between port() and port_or_known_default() is useful: the former returns None if no port is specified, while the latter returns 80 for HTTP or 443 for HTTPS when the port is omitted.

Working with Query Parameters

Query parameters are accessed through the query_pairs() method, which returns an iterator over key-value pairs. This lazy approach is efficient for large query strings.

rust

use url::Url;

fn main() -> Result<(), url::ParseError> {
    let url = Url::parse("https://api.com/search?q=rust+programming&page=1&limit=20")?;

    // Iterate over query parameters
    for (key, value) in url.query_pairs() {
        println!("{} = {}", key, value);
    }
    // Output:
    // q = rust programming
    // page = 1
    // limit = 20

    // Collect into HashMap
    use std::collections::HashMap;
    let params: HashMap<_, _> = url.query_pairs().into_owned().collect();

    if let Some(query) = params.get("q") {
        println!("Search query: {}", query);  // "rust programming"
    }

    Ok(())
}

The iterator yields Cow<str> (copy-on-write strings) for efficiency. When collecting into a HashMap, use into_owned() to convert to owned strings. Note that duplicate keys will overwrite earlier values in a HashMap.

With parsing covered, let's look at how to construct and modify URLs.

Building URLs

URLs in Rust are mutable through setter methods. You can modify any component after parsing, and the URL struct ensures the result remains valid.

rust

use url::Url;

fn main() -> Result<(), url::ParseError> {
    // Start with a base URL and modify it
    let mut url = Url::parse("https://api.example.com")?;

    url.set_path("/v2/users");
    url.set_query(Some("page=1&limit=10"));
    url.set_fragment(Some("results"));

    println!("{}", url);
    // https://api.example.com/v2/users?page=1&limit=10#results

    // Build query string programmatically
    let mut url = Url::parse("https://api.example.com/search")?;
    url.query_pairs_mut()
        .append_pair("q", "rust tutorial")
        .append_pair("page", "1")
        .append_pair("sort", "relevance");

    println!("{}", url);
    // https://api.example.com/search?q=rust+tutorial&page=1&sort=relevance

    Ok(())
}

The query_pairs_mut() method returns a serializer that lets you build query strings using a fluent API. The append_pair() method handles encoding automatically. Special characters in values are percent-encoded.

Modifying Query Parameters

Modifying existing query parameters requires clearing and rebuilding, since there's no direct "update" method. The fluent API makes this concise.

rust

use url::Url;

fn main() -> Result<(), url::ParseError> {
    let mut url = Url::parse("https://shop.com/products?category=books&page=1")?;

    // Clear existing query and set new parameters
    url.query_pairs_mut()
        .clear()
        .append_pair("category", "electronics")
        .append_pair("page", "2")
        .append_pair("sort", "price");

    println!("{}", url);
    // https://shop.com/products?category=electronics&page=2&sort=price

    // Add parameters while keeping existing ones
    let mut url = Url::parse("https://api.com/data?existing=value")?;
    url.query_pairs_mut()
        .append_pair("new", "param");

    println!("{}", url);
    // https://api.com/data?existing=value&new=param

    Ok(())
}

Use clear() when you want to replace all parameters, or omit it to append to existing ones. The serializer is dropped when it goes out of scope, at which point it writes the encoded query string back to the URL.

Resolving Relative URLs

Resolving relative URLs is common when crawling websites or processing HTML. The join() method follows RFC 3986 rules for resolution.

rust

use url::Url;

fn main() -> Result<(), url::ParseError> {
    let base = Url::parse("https://example.com/docs/guide/")?;

    // Resolve relative URLs against base
    let relative = base.join("../api/reference")?;
    println!("{}", relative);  // https://example.com/docs/api/reference

    let absolute = base.join("/about")?;
    println!("{}", absolute);  // https://example.com/about

    let with_query = base.join("page?id=123")?;
    println!("{}", with_query);  // https://example.com/docs/guide/page?id=123

    // Parse relative URLs
    let options = Url::options().base_url(Some(&base));
    let resolved = options.parse("../images/logo.png")?;
    println!("{}", resolved);  // https://example.com/docs/images/logo.png

    Ok(())
}

The join() method handles all relative URL patterns: parent directories (..), same-directory, root-relative (/path), and protocol-relative. For more control, use Url::options() with a base URL.

URL Encoding

The url crate handles most encoding automatically, but you can also work with encoding directly using the form_urlencoded module.

rust

use url::Url;
use url::form_urlencoded;

fn main() -> Result<(), url::ParseError> {
    // The url crate handles encoding automatically
    let mut url = Url::parse("https://api.com/search")?;
    url.query_pairs_mut()
        .append_pair("q", "hello world & friends")
        .append_pair("special", "100% safe <script>");

    println!("{}", url);
    // https://api.com/search?q=hello+world+%26+friends&special=100%25+safe+%3Cscript%3E

    // Manual encoding/decoding
    let encoded: String = form_urlencoded::Serializer::new(String::new())
        .append_pair("name", "John Doe")
        .append_pair("email", "john@example.com")
        .finish();

    println!("{}", encoded);  // name=John+Doe&email=john%40example.com

    // Decode
    let decoded: Vec<(String, String)> = form_urlencoded::parse(encoded.as_bytes())
        .into_owned()
        .collect();

    for (key, value) in decoded {
        println!("{}: {}", key, value);
    }

    Ok(())
}

The Serializer type builds encoded query strings, while parse() decodes them. Notice how special characters like & and @ are automatically percent-encoded. The crate follows the application/x-www-form-urlencoded format where spaces become +.

URL Validation

URL validation in Rust benefits from the type system. You can encode security requirements in function signatures and use pattern matching for clean validation logic.

rust

use url::Url;

fn is_valid_http_url(input: &str) -> bool {
    match Url::parse(input) {
        Ok(url) => {
            // Must be http or https
            matches!(url.scheme(), "http" | "https")
                // Must have a host
                && url.host_str().is_some()
                // Host must not be localhost for external URLs
                && url.host_str() != Some("localhost")
        }
        Err(_) => false,
    }
}

fn is_safe_redirect(url: &str, allowed_hosts: &[&str]) -> bool {
    match Url::parse(url) {
        Ok(parsed) => {
            if let Some(host) = parsed.host_str() {
                allowed_hosts.contains(&host)
            } else {
                false
            }
        }
        Err(_) => false,
    }
}

fn main() {
    println!("{}", is_valid_http_url("https://example.com")); // true
    println!("{}", is_valid_http_url("javascript:alert(1)")); // false
    println!("{}", is_valid_http_url("file:///etc/passwd"));  // false

    let allowed = vec!["myapp.com", "api.myapp.com"];
    println!("{}", is_safe_redirect("https://myapp.com/callback", &allowed)); // true
    println!("{}", is_safe_redirect("https://evil.com/steal", &allowed));     // false
}

The matches! macro provides a clean way to check if a value matches a pattern. These validation functions return early on any failure, following Rust's preference for explicit error handling.

Working with Host Types

The url crate distinguishes between domain names, IPv4 addresses, and IPv6 addresses at the type level. This lets you handle each case differently and use IP-specific validation methods.

rust

use url::{Url, Host};
use std::net::{Ipv4Addr, Ipv6Addr};

fn main() -> Result<(), url::ParseError> {
    let urls = vec![
        "https://example.com/",
        "https://192.168.1.1/",
        "https://[::1]/",
    ];

    for url_str in urls {
        let url = Url::parse(url_str)?;

        match url.host() {
            Some(Host::Domain(domain)) => {
                println!("Domain: {}", domain);
            }
            Some(Host::Ipv4(ip)) => {
                println!("IPv4: {}", ip);
                // Check for private IPs
                if ip.is_private() {
                    println!("  (private address)");
                }
            }
            Some(Host::Ipv6(ip)) => {
                println!("IPv6: {}", ip);
                if ip.is_loopback() {
                    println!("  (loopback address)");
                }
            }
            None => println!("No host"),
        }
    }

    Ok(())
}

Pattern matching on the Host enum lets you handle each case appropriately. For IP addresses, you can use methods like is_private() and is_loopback() for security checks. This is particularly useful for SSRF prevention.

Practical Examples

Let's put these concepts together with some practical utility functions you can use in your Rust projects.

rust

use url::Url;
use std::collections::HashMap;

// Pagination helper
fn paginate_url(base: &str, page: u32, per_page: u32) -> Result<String, url::ParseError> {
    let mut url = Url::parse(base)?;
    url.query_pairs_mut()
        .append_pair("page", &page.to_string())
        .append_pair("per_page", &per_page.to_string());
    Ok(url.to_string())
}

// Add UTM parameters
fn add_utm_params(
    base: &str,
    source: &str,
    medium: &str,
    campaign: &str,
) -> Result<String, url::ParseError> {
    let mut url = Url::parse(base)?;
    url.query_pairs_mut()
        .append_pair("utm_source", source)
        .append_pair("utm_medium", medium)
        .append_pair("utm_campaign", campaign);
    Ok(url.to_string())
}

// Extract and validate OAuth callback
fn validate_oauth_callback(callback_url: &str) -> Option<HashMap<String, String>> {
    let url = Url::parse(callback_url).ok()?;

    let params: HashMap<String, String> = url.query_pairs()
        .map(|(k, v)| (k.to_string(), v.to_string()))
        .collect();

    // Must have either code or error
    if params.contains_key("code") || params.contains_key("error") {
        Some(params)
    } else {
        None
    }
}

fn main() -> Result<(), url::ParseError> {
    println!("{}", paginate_url("https://api.com/users", 2, 20)?);
    // https://api.com/users?page=2&per_page=20

    println!("{}", add_utm_params(
        "https://mysite.com/landing",
        "twitter",
        "social",
        "launch2026"
    )?);
    // https://mysite.com/landing?utm_source=twitter&utm_medium=social&utm_campaign=launch2026

    Ok(())
}

These examples show idiomatic Rust patterns: returning Result types, using the ? operator for error propagation, and leveraging the type system for safety. The fluent API makes URL manipulation concise while maintaining clarity.

Parse URLs in Rust

Key Takeaways

Setup

Parsing URLs

URL Components

Working with Query Parameters

Building URLs

Modifying Query Parameters

Resolving Relative URLs

URL Encoding

URL Validation

Working with Host Types

Practical Examples

Frequently Asked Questions

Related Guides

URL Anatomy

URL Validation

Try it yourself