Python's urllib.parse module provides functions for URL encoding and decoding. Understanding which function to use prevents common URL bugs.
Key Takeaways
- 1quote() encodes path segments (space → %20)
- 2quote_plus() encodes query values (space → +)
- 3urlencode() builds query strings from dicts
- 4unquote() and unquote_plus() for decoding
- 5Use safe parameter to preserve specific characters
"The quote() function uses UTF-8 encoding by default. The optional safe parameter specifies ASCII characters that should not be encoded — its default value is '/'."
Encoding Functions
Python's urllib.parse module provides three main encoding functions, each optimized for different parts of a URL. Understanding when to use each prevents common encoding bugs. The critical difference is how they handle spaces.
| Function | Space Becomes | Use For |
|---|---|---|
| quote() | %20 | Path segments |
| quote_plus() | + | Query parameter values |
| urlencode() | + | Complete query strings |
The quote() and quote_plus() functions differ only in their handling of spaces. This mirrors the URL specification: paths use %20 for spaces, while query strings can use either + or %20. Most web servers accept both in query strings, but only %20 works correctly in paths.
quote() - Path Encoding
Use quote() when encoding URL path segments. It converts spaces to %20 and leaves forward slashes unencoded by default. You can customize which characters to preserve using the safe parameter.
from urllib.parse import quote
# Basic encoding (path segments)
quote('hello world') # 'hello%20world'
quote('Tom & Jerry') # 'Tom%20%26%20Jerry'
quote('file/name.txt') # 'file%2Fname.txt'
# Preserve specific characters with safe=
quote('/path/to/file', safe='/') # '/path/to/file'
quote('a=b&c=d', safe='=&') # 'a=b&c=d'
# Unicode handling
quote('café') # 'caf%C3%A9'
quote('日本語') # '%E6%97%A5%E6%9C%AC%E8%AA%9E'The safe parameter is powerful but requires care. Setting safe='' encodes everything including slashes, which you need when a path segment contains literal slash characters. The default safe='/' preserves path structure when encoding a complete path.
For query parameter values, you'll want quote_plus() instead, which produces more compact URLs by using + for spaces.
quote_plus() - Query Value Encoding
When building query strings manually, quote_plus() follows the application/x-www-form-urlencoded format used by HTML forms. This format uses + for spaces, resulting in shorter, more readable URLs than %20.
from urllib.parse import quote_plus
# Spaces become + (form encoding)
quote_plus('hello world') # 'hello+world'
quote_plus('Tom & Jerry') # 'Tom+%26+Jerry'
# Building a query parameter manually
query = f"?q={quote_plus('search term')}&page=1"
# '?q=search+term&page=1'While quote_plus() works for individual values, building query strings this way is tedious and error-prone. You need to remember to encode each value and properly join them with &. The urlencode() function handles all of this automatically.
urlencode() - Query String Building
For building complete query strings, urlencode() is Python's most convenient option. It accepts dictionaries or lists of tuples, automatically encodes all values, and joins them with &. This is equivalent to JavaScript's URLSearchParams.
from urllib.parse import urlencode
# From dictionary
params = {
'q': 'hello world',
'page': 1,
'sort': 'date'
}
urlencode(params)
# 'q=hello+world&page=1&sort=date'
# Multiple values for same key
params = [('tag', 'python'), ('tag', 'url'), ('page', '1')]
urlencode(params)
# 'tag=python&tag=url&page=1'
# Nested structures (doseq=True)
params = {'tags': ['a', 'b', 'c']}
urlencode(params, doseq=True)
# 'tags=a&tags=b&tags=c'When you need multiple values for the same key (common for filters and multi-select forms), pass a list of tuples instead of a dictionary. For dictionary values that are lists, the doseq=True parameter expands them into separate key-value pairs. This is often what you want for arrays in query strings.
Understanding decoding is just as important as encoding, especially when processing URLs from external sources.
Decoding
Python provides matching decode functions for each encoding function. Use unquote() for path-encoded strings and unquote_plus() for query strings where + represents spaces. For parsing complete query strings into dictionaries, parse_qs() handles everything automatically.
from urllib.parse import unquote, unquote_plus, parse_qs
# Decode %XX sequences
unquote('hello%20world') # 'hello world'
unquote('caf%C3%A9') # 'café'
# Decode + as space
unquote_plus('hello+world') # 'hello world'
# Parse query string to dict
parse_qs('q=hello&page=1&tag=a&tag=b')
# {'q': ['hello'], 'page': ['1'], 'tag': ['a', 'b']}Notice that parse_qs() returns lists for all values, even when there's only one. This handles the case where a key appears multiple times (like tag above). If you know each key appears once, access the first element: params['q'][0].
Let's put all these functions together in a real-world example that builds a complete API URL.
Complete Example
Building URLs properly requires using the right encoding function for each part. This example demonstrates encoding a path segment (with quote()) separately from query parameters (with urlencode()), then combining them correctly.
from urllib.parse import urljoin, urlencode, quote
# Build a complete URL
base = 'https://api.example.com'
path = '/search/' + quote('Tom & Jerry', safe='')
params = urlencode({
'category': 'cartoons',
'year': 2024
})
url = f"{base}{path}?{params}"
# 'https://api.example.com/search/Tom%20%26%20Jerry?category=cartoons&year=2024'
# Or use urllib.parse.urljoin
from urllib.parse import urlunparse
url = urlunparse(('https', 'api.example.com', path, '', params, ''))The key insight here is safe='' in the quote() call. Without it, the ampersand in "Tom & Jerry" would remain unencoded (since & is technically valid in paths), but that would break many servers. Setting safe='' ensures everything gets encoded. The urlunparse() function provides a more structured alternative when you need to build URLs from components.