When should I use URL instead of URI?

Use URI for parsing, validation, and URL manipulation. Use URL only when you need to open a connection or stream, though java.net.http.HttpClient (Java 11+) works directly with URI and is preferred for HTTP operations.

How do I handle international domain names (IDN)?

Use java.net.IDN.toASCII() to convert Unicode domain names to Punycode (ASCII-compatible encoding). For example, IDN.toASCII("münchen.de") returns "xn--mnchen-3ya.de".

Why does URLEncoder encode spaces as + instead of %20?

URLEncoder follows the application/x-www-form-urlencoded format used in HTML forms, where spaces become +. For URL path segments, replace + with %20 after encoding, or use URI constructor which handles this automatically.

Parse URLs in Java - URI and URL Classes Guide

Java provides two main classes for URL handling: java.net.URL and java.net.URI. While URL is for network operations, URI is preferred for parsing and manipulation due to better RFC compliance.

Key Takeaways

1Use URI for parsing and manipulation, URL for network operations
2URI follows RFC 2396/3986, providing stricter validation
3URLEncoder/URLDecoder handle query string encoding
4Java 11+ HttpClient works with URI directly
5Consider Apache HttpClient or OkHttp for complex URL building

Definition

Java URL Parsing

Java URL parsing uses java.net.URI for RFC-compliant URL parsing and manipulation, and java.net.URL for network operations. The URI class provides methods to access scheme, host, port, path, query, and fragment components with proper encoding support.Source: Java SE Documentation - URI

“A URI is a uniform resource identifier while a URL is a uniform resource locator. Hence every URL is a URI, abstractly speaking, but not every URI is a URL. This is because there is another subcategory of URIs, uniform resource names (URNs).”
— Java SE Documentation

URI vs URL

Java's naming can be confusing: both URL and URI classes exist, but they serve different purposes. Understanding when to use each will save you from common pitfalls.

Class	Purpose	When to Use
java.net.URI	Parsing, validation, manipulation	General URL handling
java.net.URL	Network operations, connections	Fetching content
java.net.URLEncoder	Encode query parameters	Building query strings
java.net.URLDecoder	Decode URL components	Reading query params

For most URL manipulation tasks, URI is the better choice. The URL class was designed for network operations and has quirks like DNS lookups in its equals() method. Modern Java code should prefer URI for parsing and HttpClient for network operations.

Parsing with URI

The URI constructor parses a URL string and throws URISyntaxException for invalid input. Like JavaScript, you'll typically need try-catch when parsing user input.

java

import java.net.URI;
import java.net.URISyntaxException;

public class UrlParser {
    public static void main(String[] args) throws URISyntaxException {
        URI uri = new URI("https://user:pass@example.com:8080/path/page?query=value#fragment");

        System.out.println("Scheme: " + uri.getScheme());       // https
        System.out.println("Host: " + uri.getHost());           // example.com
        System.out.println("Port: " + uri.getPort());           // 8080
        System.out.println("Path: " + uri.getPath());           // /path/page
        System.out.println("Query: " + uri.getQuery());         // query=value
        System.out.println("Fragment: " + uri.getFragment());   // fragment
        System.out.println("User Info: " + uri.getUserInfo());  // user:pass
        System.out.println("Authority: " + uri.getAuthority()); // user:pass@example.com:8080

        // Raw vs decoded values
        URI encoded = new URI("https://example.com/path?q=hello%20world");
        System.out.println("Raw Query: " + encoded.getRawQuery());    // q=hello%20world
        System.out.println("Query: " + encoded.getQuery());           // q=hello world
    }
}

Notice the distinction between getRawQuery() and getQuery(). The "raw" version preserves percent-encoding, while the regular version decodes it. This pattern appears throughout the URI class: getRawPath() vs getPath(), etc.

Parsing Query Parameters

Unlike JavaScript's URLSearchParams, Java doesn't have a built-in query string parser. You'll need to create one. This is a common utility function in Java web applications.

java

import java.net.URI;
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
import java.util.*;

public class QueryParser {

    public static Map<String, List<String>> parseQuery(String query) {
        Map<String, List<String>> params = new LinkedHashMap<>();

        if (query == null || query.isEmpty()) {
            return params;
        }

        for (String pair : query.split("&")) {
            int idx = pair.indexOf('=');
            String key = idx > 0 ? pair.substring(0, idx) : pair;
            String value = idx > 0 ? pair.substring(idx + 1) : "";

            key = URLDecoder.decode(key, StandardCharsets.UTF_8);
            value = URLDecoder.decode(value, StandardCharsets.UTF_8);

            params.computeIfAbsent(key, k -> new ArrayList<>()).add(value);
        }

        return params;
    }

    public static void main(String[] args) throws Exception {
        URI uri = new URI("https://shop.com/search?category=books&tag=java&tag=programming");

        Map<String, List<String>> params = parseQuery(uri.getQuery());

        System.out.println(params.get("category")); // [books]
        System.out.println(params.get("tag"));      // [java, programming]

        // Get single value
        String category = params.getOrDefault("category", List.of("")).get(0);
        System.out.println("Category: " + category); // books
    }
}

The parser uses computeIfAbsent() to handle duplicate keys elegantly, storing all values in a list. This matches how other languages handle repeated query parameters. Remember to decode both keys and values, as they might contain percent-encoded characters.

Building URLs

Building URLs in Java requires careful attention to encoding. The URI class has a constructor that accepts individual components and handles encoding for you.

java

import java.net.URI;
import java.net.URLEncoder;
import java.nio.charset.StandardCharsets;

public class UrlBuilder {

    public static String buildQuery(Map<String, String> params) {
        StringBuilder query = new StringBuilder();

        for (Map.Entry<String, String> entry : params.entrySet()) {
            if (query.length() > 0) {
                query.append("&");
            }
            query.append(URLEncoder.encode(entry.getKey(), StandardCharsets.UTF_8));
            query.append("=");
            query.append(URLEncoder.encode(entry.getValue(), StandardCharsets.UTF_8));
        }

        return query.toString();
    }

    public static void main(String[] args) throws Exception {
        // Using URI constructor
        URI uri = new URI(
            "https",                    // scheme
            "user:pass",                // userInfo
            "example.com",              // host
            8080,                       // port
            "/api/users",               // path
            "page=1&limit=10",          // query
            "results"                   // fragment
        );

        System.out.println(uri);
        // https://user:pass@example.com:8080/api/users?page=1&limit=10#results

        // Building query string
        Map<String, String> params = new LinkedHashMap<>();
        params.put("q", "java tutorial");
        params.put("page", "1");
        params.put("sort", "relevance");

        String query = buildQuery(params);
        URI searchUri = new URI("https", "api.example.com", "/search", query, null);

        System.out.println(searchUri);
        // https://api.example.com/search?q=java+tutorial&page=1&sort=relevance
    }
}

The multi-argument URI constructor handles encoding automatically. For query strings, use URLEncoder explicitly. The LinkedHashMap preserves insertion order, which can be useful for debugging and testing.

Modifying URLs

Since URI is immutable, modifying a URL means creating a new one with the changed components. This pattern is verbose but safe.

java

import java.net.URI;
import java.util.*;

public class UrlModifier {

    public static URI setQueryParam(URI uri, String key, String value) throws Exception {
        Map<String, List<String>> params = parseQuery(uri.getQuery());
        params.put(key, List.of(value));

        String newQuery = buildQueryFromMap(params);

        return new URI(
            uri.getScheme(),
            uri.getUserInfo(),
            uri.getHost(),
            uri.getPort(),
            uri.getPath(),
            newQuery,
            uri.getFragment()
        );
    }

    public static URI removeQueryParam(URI uri, String key) throws Exception {
        Map<String, List<String>> params = parseQuery(uri.getQuery());
        params.remove(key);

        String newQuery = params.isEmpty() ? null : buildQueryFromMap(params);

        return new URI(
            uri.getScheme(),
            uri.getUserInfo(),
            uri.getHost(),
            uri.getPort(),
            uri.getPath(),
            newQuery,
            uri.getFragment()
        );
    }

    private static String buildQueryFromMap(Map<String, List<String>> params) {
        StringBuilder query = new StringBuilder();

        for (Map.Entry<String, List<String>> entry : params.entrySet()) {
            for (String value : entry.getValue()) {
                if (query.length() > 0) query.append("&");
                query.append(URLEncoder.encode(entry.getKey(), StandardCharsets.UTF_8));
                query.append("=");
                query.append(URLEncoder.encode(value, StandardCharsets.UTF_8));
            }
        }

        return query.toString();
    }

    public static void main(String[] args) throws Exception {
        URI uri = new URI("https://api.com/search?q=java&page=1");

        // Add/update parameter
        URI updated = setQueryParam(uri, "page", "2");
        System.out.println(updated);
        // https://api.com/search?q=java&page=2

        // Remove parameter
        URI removed = removeQueryParam(uri, "page");
        System.out.println(removed);
        // https://api.com/search?q=java
    }
}

Each modification requires constructing a new URI with all components. The helper methods setQueryParam() and removeQueryParam() encapsulate this boilerplate. In practice, you'd put these in a utility class.

Resolving Relative URLs

The resolve() method handles relative URL resolution following RFC 3986. It also supports the reverse operation through relativize().

java

import java.net.URI;

public class RelativeUrls {
    public static void main(String[] args) throws Exception {
        URI base = new URI("https://example.com/docs/guide/");

        // Resolve relative paths
        URI relative1 = base.resolve("../api/reference");
        System.out.println(relative1);
        // https://example.com/docs/api/reference

        URI relative2 = base.resolve("page.html");
        System.out.println(relative2);
        // https://example.com/docs/guide/page.html

        URI absolute = base.resolve("/about");
        System.out.println(absolute);
        // https://example.com/about

        // Relativize (reverse operation)
        URI target = new URI("https://example.com/docs/api/reference");
        URI relativized = base.relativize(target);
        System.out.println(relativized);
        // ../api/reference
    }
}

The resolve() method works like JavaScript's second argument to the URL constructor. The relativize() method is useful when you need to generate relative links, such as for site maps or navigation.

URL Encoding

Java's URLEncoder follows the HTML form encoding specification, which encodes spaces as +. This is correct for query strings but not for path segments.

java

import java.net.URLEncoder;
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;

public class UrlEncoding {
    public static void main(String[] args) {
        // Encoding
        String raw = "hello world & friends";
        String encoded = URLEncoder.encode(raw, StandardCharsets.UTF_8);
        System.out.println(encoded);  // hello+world+%26+friends

        // Decoding
        String decoded = URLDecoder.decode(encoded, StandardCharsets.UTF_8);
        System.out.println(decoded);  // hello world & friends

        // Note: URLEncoder encodes spaces as + (application/x-www-form-urlencoded)
        // For path segments, you might need to replace + with %20
        String pathEncoded = URLEncoder.encode("my file.txt", StandardCharsets.UTF_8)
            .replace("+", "%20");
        System.out.println(pathEncoded);  // my%20file.txt

        // Complete example
        String query = "q=" + URLEncoder.encode("java programming", StandardCharsets.UTF_8)
            + "&sort=" + URLEncoder.encode("date:desc", StandardCharsets.UTF_8);
        System.out.println(query);
        // q=java+programming&sort=date%3Adesc
    }
}

The key gotcha: URLEncoder encodes spaces as +, but path segments should use %20. For path encoding, either use the multi-argument URI constructor or replace + with %20 after encoding.

URL Validation

Validating URLs is essential for security. Java's URI class throws exceptions for malformed URLs, which you can use for basic validation.

java

import java.net.URI;
import java.net.URISyntaxException;
import java.util.Set;

public class UrlValidator {

    private static final Set<String> ALLOWED_SCHEMES = Set.of("http", "https");

    public static boolean isValidHttpUrl(String url) {
        try {
            URI uri = new URI(url);

            // Must have allowed scheme
            if (!ALLOWED_SCHEMES.contains(uri.getScheme())) {
                return false;
            }

            // Must have host
            if (uri.getHost() == null || uri.getHost().isEmpty()) {
                return false;
            }

            return true;
        } catch (URISyntaxException e) {
            return false;
        }
    }

    public static boolean isSafeRedirect(String url, Set<String> allowedHosts) {
        try {
            URI uri = new URI(url);

            if (!ALLOWED_SCHEMES.contains(uri.getScheme())) {
                return false;
            }

            String host = uri.getHost();
            if (host == null) {
                return false;
            }

            // Check exact match or subdomain
            for (String allowed : allowedHosts) {
                if (host.equals(allowed) || host.endsWith("." + allowed)) {
                    return true;
                }
            }

            return false;
        } catch (URISyntaxException e) {
            return false;
        }
    }

    public static void main(String[] args) {
        System.out.println(isValidHttpUrl("https://example.com")); // true
        System.out.println(isValidHttpUrl("javascript:alert(1)")); // false
        System.out.println(isValidHttpUrl("ftp://files.com"));     // false

        Set<String> allowed = Set.of("myapp.com", "api.myapp.com");
        System.out.println(isSafeRedirect("https://myapp.com/cb", allowed));  // true
        System.out.println(isSafeRedirect("https://evil.com/cb", allowed));   // false
    }
}

The isSafeRedirect() function demonstrates a common security pattern: validating that redirect URLs only go to allowed hosts. The endsWith() check handles subdomains like api.myapp.com matching the allowed host myapp.com.

With Java HttpClient

Java 11 introduced a modern HttpClient that works directly with URI. Here's how to combine URL building with HTTP requests.

java

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;

public class HttpClientExample {
    public static void main(String[] args) throws Exception {
        HttpClient client = HttpClient.newHttpClient();

        // Build URI with query parameters
        String query = "q=" + URLEncoder.encode("java 21", StandardCharsets.UTF_8);
        URI uri = new URI("https", "api.example.com", "/search", query, null);

        HttpRequest request = HttpRequest.newBuilder()
            .uri(uri)
            .header("Accept", "application/json")
            .GET()
            .build();

        HttpResponse<String> response = client.send(
            request,
            HttpResponse.BodyHandlers.ofString()
        );

        System.out.println("Status: " + response.statusCode());
        System.out.println("Body: " + response.body());

        // Get final URI after redirects
        System.out.println("Final URI: " + response.uri());
    }
}

The HttpClient API uses a builder pattern that's easy to chain with URI construction. The response object includes the final URI, which is useful when following redirects. For async operations, use sendAsync() instead.

Parse URLs in Java

Key Takeaways

URI vs URL

Parsing with URI

Parsing Query Parameters

Building URLs

Modifying URLs

Resolving Relative URLs

URL Encoding

URL Validation

With Java HttpClient

Frequently Asked Questions

Related Guides

URL Anatomy

Kotlin URLs

Try it yourself