Java provides two main classes for URL handling: java.net.URL and java.net.URI. While URL is for network operations, URI is preferred for parsing and manipulation due to better RFC compliance.
Key Takeaways
- 1Use URI for parsing and manipulation, URL for network operations
- 2URI follows RFC 2396/3986, providing stricter validation
- 3URLEncoder/URLDecoder handle query string encoding
- 4Java 11+ HttpClient works with URI directly
- 5Consider Apache HttpClient or OkHttp for complex URL building
“A URI is a uniform resource identifier while a URL is a uniform resource locator. Hence every URL is a URI, abstractly speaking, but not every URI is a URL. This is because there is another subcategory of URIs, uniform resource names (URNs).”
URI vs URL
Java's naming can be confusing: both URL and URI classes exist, but they serve different purposes. Understanding when to use each will save you from common pitfalls.
| Class | Purpose | When to Use |
|---|---|---|
| java.net.URI | Parsing, validation, manipulation | General URL handling |
| java.net.URL | Network operations, connections | Fetching content |
| java.net.URLEncoder | Encode query parameters | Building query strings |
| java.net.URLDecoder | Decode URL components | Reading query params |
For most URL manipulation tasks, URI is the better choice. The URL class was designed for network operations and has quirks like DNS lookups in its equals() method. Modern Java code should prefer URI for parsing and HttpClient for network operations.
Parsing with URI
The URI constructor parses a URL string and throws URISyntaxException for invalid input. Like JavaScript, you'll typically need try-catch when parsing user input.
import java.net.URI;
import java.net.URISyntaxException;
public class UrlParser {
public static void main(String[] args) throws URISyntaxException {
URI uri = new URI("https://user:pass@example.com:8080/path/page?query=value#fragment");
System.out.println("Scheme: " + uri.getScheme()); // https
System.out.println("Host: " + uri.getHost()); // example.com
System.out.println("Port: " + uri.getPort()); // 8080
System.out.println("Path: " + uri.getPath()); // /path/page
System.out.println("Query: " + uri.getQuery()); // query=value
System.out.println("Fragment: " + uri.getFragment()); // fragment
System.out.println("User Info: " + uri.getUserInfo()); // user:pass
System.out.println("Authority: " + uri.getAuthority()); // user:pass@example.com:8080
// Raw vs decoded values
URI encoded = new URI("https://example.com/path?q=hello%20world");
System.out.println("Raw Query: " + encoded.getRawQuery()); // q=hello%20world
System.out.println("Query: " + encoded.getQuery()); // q=hello world
}
}Notice the distinction between getRawQuery() and getQuery(). The "raw" version preserves percent-encoding, while the regular version decodes it. This pattern appears throughout the URI class: getRawPath() vs getPath(), etc.
Parsing Query Parameters
Unlike JavaScript's URLSearchParams, Java doesn't have a built-in query string parser. You'll need to create one. This is a common utility function in Java web applications.
import java.net.URI;
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
import java.util.*;
public class QueryParser {
public static Map<String, List<String>> parseQuery(String query) {
Map<String, List<String>> params = new LinkedHashMap<>();
if (query == null || query.isEmpty()) {
return params;
}
for (String pair : query.split("&")) {
int idx = pair.indexOf('=');
String key = idx > 0 ? pair.substring(0, idx) : pair;
String value = idx > 0 ? pair.substring(idx + 1) : "";
key = URLDecoder.decode(key, StandardCharsets.UTF_8);
value = URLDecoder.decode(value, StandardCharsets.UTF_8);
params.computeIfAbsent(key, k -> new ArrayList<>()).add(value);
}
return params;
}
public static void main(String[] args) throws Exception {
URI uri = new URI("https://shop.com/search?category=books&tag=java&tag=programming");
Map<String, List<String>> params = parseQuery(uri.getQuery());
System.out.println(params.get("category")); // [books]
System.out.println(params.get("tag")); // [java, programming]
// Get single value
String category = params.getOrDefault("category", List.of("")).get(0);
System.out.println("Category: " + category); // books
}
}The parser uses computeIfAbsent() to handle duplicate keys elegantly, storing all values in a list. This matches how other languages handle repeated query parameters. Remember to decode both keys and values, as they might contain percent-encoded characters.
Building URLs
Building URLs in Java requires careful attention to encoding. The URI class has a constructor that accepts individual components and handles encoding for you.
import java.net.URI;
import java.net.URLEncoder;
import java.nio.charset.StandardCharsets;
public class UrlBuilder {
public static String buildQuery(Map<String, String> params) {
StringBuilder query = new StringBuilder();
for (Map.Entry<String, String> entry : params.entrySet()) {
if (query.length() > 0) {
query.append("&");
}
query.append(URLEncoder.encode(entry.getKey(), StandardCharsets.UTF_8));
query.append("=");
query.append(URLEncoder.encode(entry.getValue(), StandardCharsets.UTF_8));
}
return query.toString();
}
public static void main(String[] args) throws Exception {
// Using URI constructor
URI uri = new URI(
"https", // scheme
"user:pass", // userInfo
"example.com", // host
8080, // port
"/api/users", // path
"page=1&limit=10", // query
"results" // fragment
);
System.out.println(uri);
// https://user:pass@example.com:8080/api/users?page=1&limit=10#results
// Building query string
Map<String, String> params = new LinkedHashMap<>();
params.put("q", "java tutorial");
params.put("page", "1");
params.put("sort", "relevance");
String query = buildQuery(params);
URI searchUri = new URI("https", "api.example.com", "/search", query, null);
System.out.println(searchUri);
// https://api.example.com/search?q=java+tutorial&page=1&sort=relevance
}
}The multi-argument URI constructor handles encoding automatically. For query strings, use URLEncoder explicitly. The LinkedHashMap preserves insertion order, which can be useful for debugging and testing.
Modifying URLs
Since URI is immutable, modifying a URL means creating a new one with the changed components. This pattern is verbose but safe.
import java.net.URI;
import java.util.*;
public class UrlModifier {
public static URI setQueryParam(URI uri, String key, String value) throws Exception {
Map<String, List<String>> params = parseQuery(uri.getQuery());
params.put(key, List.of(value));
String newQuery = buildQueryFromMap(params);
return new URI(
uri.getScheme(),
uri.getUserInfo(),
uri.getHost(),
uri.getPort(),
uri.getPath(),
newQuery,
uri.getFragment()
);
}
public static URI removeQueryParam(URI uri, String key) throws Exception {
Map<String, List<String>> params = parseQuery(uri.getQuery());
params.remove(key);
String newQuery = params.isEmpty() ? null : buildQueryFromMap(params);
return new URI(
uri.getScheme(),
uri.getUserInfo(),
uri.getHost(),
uri.getPort(),
uri.getPath(),
newQuery,
uri.getFragment()
);
}
private static String buildQueryFromMap(Map<String, List<String>> params) {
StringBuilder query = new StringBuilder();
for (Map.Entry<String, List<String>> entry : params.entrySet()) {
for (String value : entry.getValue()) {
if (query.length() > 0) query.append("&");
query.append(URLEncoder.encode(entry.getKey(), StandardCharsets.UTF_8));
query.append("=");
query.append(URLEncoder.encode(value, StandardCharsets.UTF_8));
}
}
return query.toString();
}
public static void main(String[] args) throws Exception {
URI uri = new URI("https://api.com/search?q=java&page=1");
// Add/update parameter
URI updated = setQueryParam(uri, "page", "2");
System.out.println(updated);
// https://api.com/search?q=java&page=2
// Remove parameter
URI removed = removeQueryParam(uri, "page");
System.out.println(removed);
// https://api.com/search?q=java
}
}Each modification requires constructing a new URI with all components. The helper methods setQueryParam() and removeQueryParam() encapsulate this boilerplate. In practice, you'd put these in a utility class.
Resolving Relative URLs
The resolve() method handles relative URL resolution following RFC 3986. It also supports the reverse operation through relativize().
import java.net.URI;
public class RelativeUrls {
public static void main(String[] args) throws Exception {
URI base = new URI("https://example.com/docs/guide/");
// Resolve relative paths
URI relative1 = base.resolve("../api/reference");
System.out.println(relative1);
// https://example.com/docs/api/reference
URI relative2 = base.resolve("page.html");
System.out.println(relative2);
// https://example.com/docs/guide/page.html
URI absolute = base.resolve("/about");
System.out.println(absolute);
// https://example.com/about
// Relativize (reverse operation)
URI target = new URI("https://example.com/docs/api/reference");
URI relativized = base.relativize(target);
System.out.println(relativized);
// ../api/reference
}
}The resolve() method works like JavaScript's second argument to the URL constructor. The relativize() method is useful when you need to generate relative links, such as for site maps or navigation.
URL Encoding
Java's URLEncoder follows the HTML form encoding specification, which encodes spaces as +. This is correct for query strings but not for path segments.
import java.net.URLEncoder;
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
public class UrlEncoding {
public static void main(String[] args) {
// Encoding
String raw = "hello world & friends";
String encoded = URLEncoder.encode(raw, StandardCharsets.UTF_8);
System.out.println(encoded); // hello+world+%26+friends
// Decoding
String decoded = URLDecoder.decode(encoded, StandardCharsets.UTF_8);
System.out.println(decoded); // hello world & friends
// Note: URLEncoder encodes spaces as + (application/x-www-form-urlencoded)
// For path segments, you might need to replace + with %20
String pathEncoded = URLEncoder.encode("my file.txt", StandardCharsets.UTF_8)
.replace("+", "%20");
System.out.println(pathEncoded); // my%20file.txt
// Complete example
String query = "q=" + URLEncoder.encode("java programming", StandardCharsets.UTF_8)
+ "&sort=" + URLEncoder.encode("date:desc", StandardCharsets.UTF_8);
System.out.println(query);
// q=java+programming&sort=date%3Adesc
}
}The key gotcha: URLEncoder encodes spaces as +, but path segments should use %20. For path encoding, either use the multi-argument URI constructor or replace + with %20 after encoding.
URL Validation
Validating URLs is essential for security. Java's URI class throws exceptions for malformed URLs, which you can use for basic validation.
import java.net.URI;
import java.net.URISyntaxException;
import java.util.Set;
public class UrlValidator {
private static final Set<String> ALLOWED_SCHEMES = Set.of("http", "https");
public static boolean isValidHttpUrl(String url) {
try {
URI uri = new URI(url);
// Must have allowed scheme
if (!ALLOWED_SCHEMES.contains(uri.getScheme())) {
return false;
}
// Must have host
if (uri.getHost() == null || uri.getHost().isEmpty()) {
return false;
}
return true;
} catch (URISyntaxException e) {
return false;
}
}
public static boolean isSafeRedirect(String url, Set<String> allowedHosts) {
try {
URI uri = new URI(url);
if (!ALLOWED_SCHEMES.contains(uri.getScheme())) {
return false;
}
String host = uri.getHost();
if (host == null) {
return false;
}
// Check exact match or subdomain
for (String allowed : allowedHosts) {
if (host.equals(allowed) || host.endsWith("." + allowed)) {
return true;
}
}
return false;
} catch (URISyntaxException e) {
return false;
}
}
public static void main(String[] args) {
System.out.println(isValidHttpUrl("https://example.com")); // true
System.out.println(isValidHttpUrl("javascript:alert(1)")); // false
System.out.println(isValidHttpUrl("ftp://files.com")); // false
Set<String> allowed = Set.of("myapp.com", "api.myapp.com");
System.out.println(isSafeRedirect("https://myapp.com/cb", allowed)); // true
System.out.println(isSafeRedirect("https://evil.com/cb", allowed)); // false
}
}The isSafeRedirect() function demonstrates a common security pattern: validating that redirect URLs only go to allowed hosts. The endsWith() check handles subdomains like api.myapp.com matching the allowed host myapp.com.
With Java HttpClient
Java 11 introduced a modern HttpClient that works directly with URI. Here's how to combine URL building with HTTP requests.
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
public class HttpClientExample {
public static void main(String[] args) throws Exception {
HttpClient client = HttpClient.newHttpClient();
// Build URI with query parameters
String query = "q=" + URLEncoder.encode("java 21", StandardCharsets.UTF_8);
URI uri = new URI("https", "api.example.com", "/search", query, null);
HttpRequest request = HttpRequest.newBuilder()
.uri(uri)
.header("Accept", "application/json")
.GET()
.build();
HttpResponse<String> response = client.send(
request,
HttpResponse.BodyHandlers.ofString()
);
System.out.println("Status: " + response.statusCode());
System.out.println("Body: " + response.body());
// Get final URI after redirects
System.out.println("Final URI: " + response.uri());
}
}The HttpClient API uses a builder pattern that's easy to chain with URI construction. The response object includes the final URI, which is useful when following redirects. For async operations, use sendAsync() instead.