URLs are a common attack vector for web applications. Attackers exploit improperly handled URLs to steal data, redirect users to malicious sites, or access internal services. This guide covers the most common URL-based vulnerabilities and how to prevent them.
Key Takeaways
- 1Never trust user-provided URLs—always validate scheme, host, and structure
- 2Block javascript:, data:, and vbscript: schemes to prevent XSS
- 3Use allowlists for redirect URLs to prevent open redirect attacks
- 4Validate URLs server-side, not just client-side
- 5Use URL parsing libraries instead of regex for validation
Common URL-Based Attacks
Before you can defend against attacks, you need to understand how they work. This section covers the four most common URL-based vulnerabilities: JavaScript URLs (XSS), open redirects, SSRF, and parameter pollution. For each attack, you'll learn the mechanism and see prevention code you can use directly.
JavaScript URLs (XSS)
The javascript: scheme executes JavaScript code when used in links, forms, or redirects. This is one of the most common XSS vectors:
<!-- Dangerous: User input as href -->
<a href="javascript:alert('XSS')">Click me</a>
<!-- Dangerous: Redirect to user-provided URL -->
<script>
window.location = userInput; // If userInput is "javascript:..."
</script>
<!-- Dangerous: Image onerror with javascript -->
<img src="x" onerror="location='javascript:alert(1)'">Prevention: Block javascript:, data:, and vbscript: schemes:
function isSafeUrl(url) {
try {
const parsed = new URL(url, window.location.origin);
const dangerousSchemes = ['javascript:', 'data:', 'vbscript:'];
return !dangerousSchemes.includes(parsed.protocol);
} catch {
return false;
}
}
// Usage
const userUrl = getUserInput();
if (isSafeUrl(userUrl)) {
window.location = userUrl;
} else {
console.error('Blocked dangerous URL');
}This validation function blocks the most dangerous schemes by checking the parsed URL's protocol. The try-catch handles malformed URLs gracefully. Use this or similar validation anywhere you process user-provided URLs—links, redirects, fetch targets, or dynamic content.
Open Redirect Attacks
Open redirects occur when an application redirects users to a URL from an untrusted source. Attackers use this for phishing:
# Legitimate login URL
https://yourbank.com/login?redirect=/dashboard
# Attacker's phishing URL
https://yourbank.com/login?redirect=https://evil.com/fake-login
# User sees yourbank.com in the URL but gets redirected to evil.comPrevention: Use an allowlist of valid redirect destinations:
// Server-side validation (Node.js example)
function isValidRedirect(url) {
try {
const parsed = new URL(url, 'https://yoursite.com');
// Only allow relative URLs or your own domain
const allowedHosts = ['yoursite.com', 'www.yoursite.com'];
// Relative URLs have no host (resolved against base)
if (parsed.host === 'yoursite.com') {
return allowedHosts.includes(parsed.hostname);
}
return false;
} catch {
return false;
}
}
// For relative URLs, also check the path
function isValidRelativeRedirect(path) {
// Block protocol-relative URLs that bypass domain check
if (path.startsWith('//')) return false;
// Block javascript: etc.
if (path.includes(':')) {
const beforeColon = path.split(':')[0].toLowerCase();
const dangerousSchemes = ['javascript', 'data', 'vbscript'];
if (dangerousSchemes.includes(beforeColon)) return false;
}
// Only allow paths starting with /
return path.startsWith('/');
}The code provides two levels of validation: isValidRedirect() for absolute URLs and isValidRelativeRedirect() for paths. Note how we block protocol-relative URLs (//evil.com), which bypass simple domain checks by inheriting the current page's scheme while redirecting to a different host.
Server-Side Request Forgery (SSRF)
SSRF attacks trick servers into making requests to internal resources:
# Application fetches user-provided URLs for previews
POST /api/preview
{ "url": "https://example.com/article" }
# Attacker provides internal URLs
POST /api/preview
{ "url": "http://localhost:8080/admin" }
{ "url": "http://169.254.169.254/metadata" } # AWS metadata
{ "url": "http://internal-service.local/secret" }Prevention: Validate and restrict URLs server-side:
import dns from 'dns';
import { promisify } from 'util';
const dnsLookup = promisify(dns.lookup);
async function isSafeToFetch(urlString) {
try {
const url = new URL(urlString);
// Only allow HTTPS
if (url.protocol !== 'https:') {
return false;
}
// Block private/internal hostnames
const blockedPatterns = [
/^localhost$/i,
/^127\.\d+\.\d+\.\d+$/,
/^10\.\d+\.\d+\.\d+$/,
/^172\.(1[6-9]|2\d|3[01])\.\d+\.\d+$/,
/^192\.168\.\d+\.\d+$/,
/^169\.254\.\d+\.\d+$/,
/\.local$/i,
/\.internal$/i,
];
if (blockedPatterns.some(p => p.test(url.hostname))) {
return false;
}
// Resolve DNS and check IP
const { address } = await dnsLookup(url.hostname);
if (blockedPatterns.some(p => p.test(address))) {
return false; // DNS resolved to private IP
}
return true;
} catch {
return false;
}
}SSRF prevention requires both hostname and IP validation. The key insight is resolving DNS before validating—attackers can register domains that resolve to internal IPs. By checking the actual IP address that will be connected to, you prevent DNS rebinding attacks.
HTTP Parameter Pollution
Attackers add extra parameters to exploit inconsistent server-side handling:
# Original URL
/transfer?to=alice&amount=100
# Attacker adds duplicate parameter
/transfer?to=alice&amount=100&to=attacker
# Different servers/frameworks handle duplicates differently:
# - Some use first value (to=alice)
# - Some use last value (to=attacker)
# - Some combine them (to=alice,attacker)Prevention: Use consistent parameter handling:
// Always use .get() for single values - it returns the first
const url = new URL(request.url);
const to = url.searchParams.get('to'); // First value only
// Or explicitly reject duplicates
function getUniqueParam(searchParams, key) {
const values = searchParams.getAll(key);
if (values.length > 1) {
throw new Error(`Duplicate parameter: ${key}`);
}
return values[0];
}The safest approach is to either always use the first value (get()) or explicitly reject duplicates. The getUniqueParam() helper makes duplicate parameter attacks impossible by throwing an error when it detects tampering.
Understanding attacks is half the battle. Now let's look at comprehensive validation patterns you can apply across your application.
URL Validation Best Practices
Good validation is your primary defense against URL-based attacks. This section covers the essential patterns: using proper parsing, validating schemes and hosts, and ensuring server-side validation. These patterns apply to any language or framework.
Use the URL API, Not Regex
URLs are complex. Regex-based validation is error-prone and can be bypassed:
// BAD: Regex is incomplete and bypassable
const urlRegex = /^https?:\/\/[^\s]+$/;
urlRegex.test('javascript:alert(1)//https://'); // false, but still dangerous
urlRegex.test('https://evil.com\\@good.com'); // true, but misleading
// GOOD: Use the URL API
function validateUrl(input) {
try {
const url = new URL(input);
// Check scheme
if (!['http:', 'https:'].includes(url.protocol)) {
return { valid: false, error: 'Invalid protocol' };
}
// Check host
if (!url.hostname) {
return { valid: false, error: 'Missing hostname' };
}
return { valid: true, url };
} catch (e) {
return { valid: false, error: 'Invalid URL format' };
}
}The regex example shows how easy it is to write incomplete validation. The URL API example is both simpler and more secure—it handles edge cases automatically and gives you clean access to URL components for further validation.
Always Validate the Scheme
function hasAllowedScheme(url, allowedSchemes = ['http:', 'https:']) {
try {
const parsed = new URL(url);
return allowedSchemes.includes(parsed.protocol);
} catch {
return false;
}
}
// For user-facing links, only allow http/https
if (!hasAllowedScheme(userUrl)) {
throw new Error('Invalid URL scheme');
}
// For internal services, you might allow other schemes
const internalSchemes = ['http:', 'https:', 'amqp:', 'redis:'];
if (!hasAllowedScheme(serviceUrl, internalSchemes)) {
throw new Error('Invalid service URL');
}The function accepts a configurable list of allowed schemes, defaulting to http and https for user-facing URLs. For internal services, you can expand the allowlist to include protocols like redis or amqp—but never expand it for user-provided URLs.
Validate the Host
function isAllowedHost(url, allowedHosts) {
try {
const parsed = new URL(url);
// Exact match
if (allowedHosts.includes(parsed.hostname)) {
return true;
}
// Subdomain match (e.g., allow *.example.com)
for (const allowed of allowedHosts) {
if (allowed.startsWith('*.')) {
const domain = allowed.slice(2);
if (parsed.hostname === domain ||
parsed.hostname.endsWith('.' + domain)) {
return true;
}
}
}
return false;
} catch {
return false;
}
}
// Usage
const allowedHosts = ['api.example.com', '*.cdn.example.com'];
if (!isAllowedHost(userUrl, allowedHosts)) {
throw new Error('Host not allowed');
}The wildcard subdomain pattern (*.cdn.example.com) lets you allow any subdomain while blocking other domains. This is useful for CDNs or multi-tenant applications, but be careful—subdomain takeover vulnerabilities can turn this into an attack vector.
Always Validate Server-Side
Client-side validation can be bypassed. Always validate on the server:
// Express middleware for URL validation
function validateRedirectUrl(req, res, next) {
const redirect = req.query.redirect;
if (!redirect) {
return next();
}
try {
const url = new URL(redirect, `https://${req.hostname}`);
// Only allow same-origin redirects
if (url.hostname !== req.hostname) {
return res.status(400).json({ error: 'Invalid redirect URL' });
}
// Block dangerous schemes
if (url.protocol !== 'https:') {
return res.status(400).json({ error: 'HTTPS required' });
}
req.validatedRedirect = url.pathname + url.search;
next();
} catch {
return res.status(400).json({ error: 'Invalid URL format' });
}
}All three examples show the same pattern: validate the redirect URL before using it, only allow same-origin redirects, block dangerous schemes, and block protocol-relative URLs. The Python and Go implementations use their respective URL parsing libraries to achieve the same security guarantees as JavaScript's URL API.
Validation prevents bad URLs from entering your system. Sanitization ensures URLs are safe when leaving your system.
URL Sanitization
Even validated URLs need proper handling when used in different contexts. This section covers encoding for URL construction, escaping for HTML, and stripping sensitive data from URLs before logging or display.
Properly Encode User Input
// WRONG: String concatenation without encoding
const searchUrl = `/search?q=${userInput}`;
// If userInput is "a&admin=true", URL becomes /search?q=a&admin=true
// RIGHT: Use URLSearchParams for automatic encoding
const url = new URL('/search', window.location.origin);
url.searchParams.set('q', userInput);
// If userInput is "a&admin=true", URL becomes /search?q=a%26admin%3DtrueThe parameter pollution example shows why string concatenation is dangerous: an attacker could inject extra parameters like admin=true. Using URLSearchParams automatically encodes the ampersand and equals sign, making parameter injection impossible.
Escaping for HTML Context
// When inserting URLs into HTML, escape for the context
// In href attributes
function safeHref(url) {
if (!isSafeUrl(url)) {
return '#blocked';
}
return url
.replace(/&/g, '&')
.replace(/"/g, '"');
}
// In JavaScript strings
function safeJsString(url) {
return url
.replace(/\\/g, '\\\\')
.replace(/'/g, "\\'")
.replace(/"/g, '\\"')
.replace(/</g, '\\u003C')
.replace(/>/g, '\\u003E');
}
// Better: Use textContent or data attributes
element.dataset.url = url; // Automatically escaped
element.href = url; // DOM API handles encodingDifferent contexts require different escaping. For href attributes, escape HTML entities. For JavaScript strings, escape quotes and angle brackets. The best approach is to use DOM APIs (dataset, href) which handle escaping automatically—avoid building HTML strings with user data.
Remove Credentials from URLs
function stripCredentials(urlString) {
try {
const url = new URL(urlString);
url.username = '';
url.password = '';
return url.toString();
} catch {
return urlString;
}
}
// Usage
const cleanUrl = stripCredentials('https://user:pass@example.com/path');
// "https://example.com/path"
// For logging/display, always strip credentials
console.log(`Fetching: ${stripCredentials(requestUrl)}`);The URL API makes credential stripping simple—just set username and password to empty strings. Always strip credentials before logging, displaying URLs to users, or including them in error reports where they might be visible.
Beyond validation and sanitization, several browser APIs require special security handling when working with URLs.
Secure URL Handling Patterns
Browser APIs like window.open(), postMessage, and form submissions all involve URLs and require careful handling. This section provides secure patterns for each, protecting your users from cross-origin attacks.
Safe window.open()
// DANGEROUS: Opens any URL
window.open(userUrl);
// SAFE: Validate and use noopener
function safeOpen(url) {
if (!isSafeUrl(url)) {
console.error('Blocked dangerous URL');
return null;
}
// noopener prevents the new page from accessing window.opener
// noreferrer prevents sending the referrer header
return window.open(url, '_blank', 'noopener,noreferrer');
}
// For links, use rel="noopener noreferrer"
// <a href={url} target="_blank" rel="noopener noreferrer">Link</a>The noopener option is critical—without it, the opened page can access your page via window.opener and redirect it (potentially to a phishing page). The noreferrer option adds privacy by not sending the referrer header. Always use both for external links.
Safe postMessage Origins
// DANGEROUS: Accepts messages from any origin
window.addEventListener('message', (event) => {
handleMessage(event.data); // Attacker can send messages from any site
});
// SAFE: Validate the origin
const ALLOWED_ORIGINS = ['https://trusted.example.com'];
window.addEventListener('message', (event) => {
if (!ALLOWED_ORIGINS.includes(event.origin)) {
console.warn(`Blocked message from: ${event.origin}`);
return;
}
handleMessage(event.data);
});
// When sending, always specify the target origin
targetWindow.postMessage(data, 'https://trusted.example.com');
// NEVER use '*' as the target origin with sensitive dataThe postMessage API is a common source of vulnerabilities. Always validate event.origin against an allowlist before processing messages, and always specify the exact target origin when sending—using '*' allows any site in an iframe to receive your message.
Validate Form Actions
// Validate form action before submission
document.querySelectorAll('form').forEach(form => {
form.addEventListener('submit', (event) => {
const action = form.action || window.location.href;
if (!isSafeUrl(action)) {
event.preventDefault();
console.error('Blocked form submission to unsafe URL');
return;
}
// Also validate the action is on your domain
const url = new URL(action);
if (url.hostname !== window.location.hostname) {
event.preventDefault();
console.error('Blocked cross-domain form submission');
}
});
});This pattern adds a safety net for dynamically created or modified forms. By validating the action URL on submit, you catch cases where JavaScript or malicious input might have changed the form destination. This is especially important for forms with user-editable fields that might be reflected in the action URL.
For defense in depth, combine these patterns with Content Security Policy headers.
Content Security Policy
Content Security Policy is your browser-level defense against injection attacks. Even if an attacker bypasses your application-level validation, CSP can prevent the exploit from executing. This section shows CSP directives specifically relevant to URL security.
CSP provides an additional layer of defense against URL-based attacks:
# Block inline scripts and dangerous URL schemes
Content-Security-Policy:
default-src 'self';
script-src 'self' https://cdn.example.com;
form-action 'self';
base-uri 'self';
navigate-to 'self' https://*.example.com;
# Explanation:
# - default-src 'self' - Only load resources from same origin
# - script-src - Only allow scripts from self and your CDN
# - form-action - Only allow form submissions to same origin
# - base-uri - Prevent base tag injection
# - navigate-to - Restrict where the page can navigate (experimental)This CSP configuration blocks inline scripts, restricts resource loading to your own origin and trusted CDN, prevents forms from submitting to external domains, and locks down the base URL. Start restrictive and loosen only as needed—it's easier to allow things than to discover you've blocked a legitimate feature.
Let's wrap up with a checklist you can use to audit your URL handling.
Security Checklist
Use this checklist when reviewing code that handles URLs. Each item links back to the relevant section in this guide. Print this out and keep it near your desk during code reviews.
| Check | Description | Priority |
|---|---|---|
| Block dangerous schemes | Reject javascript:, data:, vbscript: | Critical |
| Validate redirect URLs | Use allowlists, reject external domains | Critical |
| Use URL API for parsing | Avoid regex, handle edge cases properly | High |
| Server-side validation | Never trust client-side validation alone | Critical |
| Encode user input | Use URLSearchParams or equivalent | High |
| Strip credentials | Remove user:pass before logging/display | High |
| Validate SSRF targets | Block private IPs, localhost, metadata endpoints | Critical |
| Set CSP headers | Defense in depth against XSS | High |
| Use rel="noopener" | On all target="_blank" links | Medium |
| Validate postMessage origins | Check event.origin against allowlist | High |