PHP provides built-in functions for URL parsing that have been part of the language since PHP 4. Understanding parse_url() and http_build_query() is essential for any PHP developer working with web applications.
Key Takeaways
- 1parse_url() splits URLs into components without decoding
- 2parse_str() converts query strings to arrays
- 3http_build_query() creates properly encoded query strings
- 4Always validate URLs before parsing in production
- 5Handle edge cases like missing components and relative URLs
“This function parses a URL and returns an associative array containing any of the various components of the URL that are present. This function is not meant to validate the given URL, it only breaks it up into the parts listed above.”
The parse_url() Function
The parse_url() function is PHP's primary tool for breaking URLs into components. Unlike JavaScript's URL class or Python's urlparse(), it returns an associative array rather than an object. This makes it easy to work with, though you'll need to handle missing components yourself.
<?php
$url = 'https://user:pass@example.com:8080/path/page?name=John&age=30#section';
$parts = parse_url($url);
print_r($parts);
/* Output:
Array
(
[scheme] => https
[host] => example.com
[port] => 8080
[user] => user
[pass] => pass
[path] => /path/page
[query] => name=John&age=30
[fragment] => section
)
*/The function returns an associative array with keys for each URL component. Not all keys are present in every URL: if there's no port, the port key won't exist. Always check with isset() before accessing components.
Extracting Specific Components
If you only need one component, you can pass a second argument to get just that value directly. This is more efficient than parsing the entire URL when you only need the host or path.
<?php
$url = 'https://example.com/search?q=php+tutorial';
// Extract individual components
$scheme = parse_url($url, PHP_URL_SCHEME); // 'https'
$host = parse_url($url, PHP_URL_HOST); // 'example.com'
$path = parse_url($url, PHP_URL_PATH); // '/search'
$query = parse_url($url, PHP_URL_QUERY); // 'q=php+tutorial'
// Available constants:
// PHP_URL_SCHEME, PHP_URL_HOST, PHP_URL_PORT, PHP_URL_USER
// PHP_URL_PASS, PHP_URL_PATH, PHP_URL_QUERY, PHP_URL_FRAGMENTUsing constants like PHP_URL_HOST returns just that component as a string (or null if missing). This is cleaner than extracting from the full array when you need a single value.
The table below shows all available constants for extracting specific components.
| Constant | Returns | Example Value |
|---|---|---|
| PHP_URL_SCHEME | Protocol | https |
| PHP_URL_HOST | Domain name | example.com |
| PHP_URL_PORT | Port number | 8080 |
| PHP_URL_PATH | Path | /search |
| PHP_URL_QUERY | Query string | q=test |
| PHP_URL_FRAGMENT | Hash/anchor | section |
Once you have the query string from a parsed URL, you'll typically want to work with individual parameters. PHP's parse_str() function handles this conversion.
Parsing Query Strings
Query strings are key-value pairs separated by ampersands. PHP's parse_str() function converts them into an array, automatically handling URL decoding and even PHP's array syntax like color[]=red.
<?php
$url = 'https://shop.com/products?category=electronics&brand=apple&sort=price';
$query = parse_url($url, PHP_URL_QUERY);
// Parse query string into array
parse_str($query, $params);
print_r($params);
/* Output:
Array
(
[category] => electronics
[brand] => apple
[sort] => price
)
*/
// Access individual parameters
echo $params['category']; // 'electronics'
// Handle array parameters
$url2 = 'https://shop.com/filter?color[]=red&color[]=blue&size=large';
parse_str(parse_url($url2, PHP_URL_QUERY), $params2);
print_r($params2);
/* Output:
Array
(
[color] => Array
(
[0] => red
[1] => blue
)
[size] => large
)
*/Notice how parse_str() handles PHP's bracket notation for arrays. When you use color[]=red&color[]=blue, it automatically creates a numeric array. This is a PHP-specific convention that JavaScript and other languages handle differently.
Now let's look at the reverse operation: building query strings from PHP arrays.
Building Query Strings
The http_build_query() function is your go-to for creating properly encoded query strings from arrays. It handles special characters, nested arrays, and even lets you customize the separator.
<?php
// Simple array
$params = [
'search' => 'php tutorial',
'page' => 1,
'limit' => 20
];
$query = http_build_query($params);
echo $query; // search=php+tutorial&page=1&limit=20
// Nested arrays
$filters = [
'category' => 'books',
'price' => ['min' => 10, 'max' => 50],
'tags' => ['programming', 'web']
];
echo http_build_query($filters);
// category=books&price%5Bmin%5D=10&price%5Bmax%5D=50&tags%5B0%5D=programming&tags%5B1%5D=web
// Custom separator and encoding
$params = ['a' => 1, 'b' => 2, 'c' => 3];
echo http_build_query($params, '', '&'); // For HTML: a=1&b=2&c=3
// Skip numeric indices for arrays
$colors = ['colors' => ['red', 'green', 'blue']];
echo http_build_query($colors, '', '&', PHP_QUERY_RFC3986);
// colors%5B0%5D=red&colors%5B1%5D=green&colors%5B2%5D=blueThe function encodes special characters automatically and handles nested arrays by encoding the bracket notation. For HTML output, use & as the separator to avoid validation issues. The PHP_QUERY_RFC3986 flag ensures RFC-compliant encoding.
Building Complete URLs
Unlike JavaScript's URL class, PHP doesn't have a built-in URL builder. You'll need to create a helper function that assembles URLs from their components.
<?php
// PHP doesn't have a built-in URL builder, so create a helper function
function buildUrl(array $parts): string {
$url = '';
if (isset($parts['scheme'])) {
$url .= $parts['scheme'] . '://';
}
if (isset($parts['user'])) {
$url .= $parts['user'];
if (isset($parts['pass'])) {
$url .= ':' . $parts['pass'];
}
$url .= '@';
}
if (isset($parts['host'])) {
$url .= $parts['host'];
}
if (isset($parts['port'])) {
$url .= ':' . $parts['port'];
}
if (isset($parts['path'])) {
$url .= $parts['path'];
}
if (isset($parts['query'])) {
$url .= '?' . $parts['query'];
}
if (isset($parts['fragment'])) {
$url .= '#' . $parts['fragment'];
}
return $url;
}
// Usage
$parts = [
'scheme' => 'https',
'host' => 'api.example.com',
'path' => '/v2/users',
'query' => http_build_query(['page' => 1, 'limit' => 10])
];
echo buildUrl($parts);
// https://api.example.com/v2/users?page=1&limit=10The buildUrl() function reconstructs a URL from a parts array (like what parse_url() returns). It checks for each component's existence and adds it with the appropriate delimiter. This pattern is useful for modifying URLs: parse, change, rebuild.
URL Encoding
PHP provides two encoding functions with different behaviors for spaces. Like Python and JavaScript, the distinction matters depending on where the encoded string will be used.
<?php
// urlencode() - for query string values (spaces become +)
$search = 'hello world & friends';
echo urlencode($search); // hello+world+%26+friends
// rawurlencode() - RFC 3986 compliant (spaces become %20)
echo rawurlencode($search); // hello%20world%20%26%20friends
// urldecode() and rawurldecode() for decoding
echo urldecode('hello+world'); // hello world
echo rawurldecode('hello%20world'); // hello world
// Complete example
$baseUrl = 'https://api.example.com/search';
$params = [
'q' => 'php & javascript',
'filter' => 'category:books'
];
// Method 1: http_build_query (recommended)
$url = $baseUrl . '?' . http_build_query($params);
// https://api.example.com/search?q=php+%26+javascript&filter=category%3Abooks
// Method 2: Manual encoding
$url = $baseUrl . '?q=' . urlencode($params['q']) . '&filter=' . urlencode($params['filter']);Use urlencode() for query string values (spaces become +) and rawurlencode() for path segments (spaces become %20). When using http_build_query(), encoding is handled automatically, so you rarely need to call these directly.
URL Validation
PHP's filter_var() function provides basic URL validation, but it's quite permissive. For security-sensitive applications, you'll want additional checks.
<?php
// Using filter_var()
$url = 'https://example.com/path?query=value';
if (filter_var($url, FILTER_VALIDATE_URL)) {
echo "Valid URL";
} else {
echo "Invalid URL";
}
// More strict validation with flags
$url = 'https://example.com/path';
$flags = FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED;
if (filter_var($url, FILTER_VALIDATE_URL, $flags)) {
echo "Valid URL with scheme and host";
}
// Custom validation function
function isValidHttpUrl(string $url): bool {
$parts = parse_url($url);
if ($parts === false) {
return false;
}
// Require scheme
if (!isset($parts['scheme']) || !in_array($parts['scheme'], ['http', 'https'])) {
return false;
}
// Require host
if (!isset($parts['host']) || empty($parts['host'])) {
return false;
}
// Validate host format
if (!filter_var($parts['host'], FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME)) {
return false;
}
return true;
}
var_dump(isValidHttpUrl('https://example.com')); // true
var_dump(isValidHttpUrl('javascript:alert(1)')); // falseThe custom isValidHttpUrl() function goes beyond filter_var() by explicitly checking for HTTP/HTTPS schemes and validating the hostname format. This prevents dangerous schemes like javascript: from passing validation.
Let's put these concepts together with some practical examples.
Real-World Examples
These utility functions solve common URL manipulation tasks in PHP applications. They follow the parse-modify-rebuild pattern that works well with PHP's URL functions.
<?php
// Pagination URL builder
function paginationUrl(string $baseUrl, int $page, int $perPage = 20): string {
$parts = parse_url($baseUrl);
$query = [];
if (isset($parts['query'])) {
parse_str($parts['query'], $query);
}
$query['page'] = $page;
$query['per_page'] = $perPage;
$parts['query'] = http_build_query($query);
return buildUrl($parts);
}
echo paginationUrl('https://api.com/users?sort=name', 2);
// https://api.com/users?sort=name&page=2&per_page=20
// Add or update query parameters
function addQueryParams(string $url, array $newParams): string {
$parts = parse_url($url);
$query = [];
if (isset($parts['query'])) {
parse_str($parts['query'], $query);
}
$query = array_merge($query, $newParams);
$parts['query'] = http_build_query($query);
return buildUrl($parts);
}
// Remove query parameters
function removeQueryParams(string $url, array $keysToRemove): string {
$parts = parse_url($url);
$query = [];
if (isset($parts['query'])) {
parse_str($parts['query'], $query);
}
foreach ($keysToRemove as $key) {
unset($query[$key]);
}
if (!empty($query)) {
$parts['query'] = http_build_query($query);
} else {
unset($parts['query']);
}
return buildUrl($parts);
}These functions follow a consistent pattern: parse the URL, extract and modify the query array, rebuild the query string, and reconstruct the URL. The array_merge() function makes combining existing and new parameters straightforward.
Edge Cases and Gotchas
PHP's URL functions have some quirks that can trip you up. The table below summarizes common issues and their solutions.
| Issue | Problem | Solution |
|---|---|---|
| Relative URLs | parse_url() handles them but results differ | Check for scheme before processing |
| Malformed URLs | parse_url() returns false | Always check return value |
| Unicode in URLs | Must be percent-encoded | Use rawurlencode() for path segments |
| Query without ? | parse_url() may not parse correctly | Ensure URL has proper format |
| Fragment-only URLs | #section returns only fragment | Handle fragment-only case separately |
The most important thing to remember: parse_url() doesn't validate URLs, it just breaks them apart. Always validate before using URL components, especially when they come from user input.