Ruby's standard library includes the URI module for URL parsing and manipulation. It provides a clean, object-oriented interface that's idiomatic for Ruby developers.
Key Takeaways
- 1URI.parse returns scheme-specific URI objects
- 2CGI module handles query string encoding/decoding
- 3Addressable gem offers enhanced URL handling
- 4URI components can be modified directly via setters
- 5Use URI.encode_www_form for building query strings
“URI is a module providing classes to handle Uniform Resource Identifiers. It provides classes to handle all kinds of URIs and has convenience methods to parse strings into URI objects.”
Parsing URLs
Ruby's URI.parse() method returns scheme-specific objects like URI::HTTPS or URI::HTTP. This means you can check the type to validate schemes, and the objects know their default ports.
require 'uri'
url = 'https://user:pass@example.com:8080/path/page?query=value#fragment'
uri = URI.parse(url)
puts uri.scheme # https
puts uri.host # example.com
puts uri.port # 8080
puts uri.path # /path/page
puts uri.query # query=value
puts uri.fragment # fragment
puts uri.user # user
puts uri.password # pass
# Full components
puts uri.userinfo # user:pass
puts uri.authority # user:pass@example.com:8080
# Request URI (path + query)
puts uri.request_uri # /path/page?query=value
# Check type
puts uri.class # URI::HTTPSThe code demonstrates accessing all URL components. Notice request_uri, which gives you the path plus query string: exactly what you'd send in an HTTP request line. The class check (URI::HTTPS) is a Ruby-idiomatic way to validate schemes.
URI Components
The URI object provides reader methods for each component. Some return nil if not present.
| Method | Returns | Example |
|---|---|---|
| scheme | Protocol | https |
| host | Domain | example.com |
| port | Port number | 8080 |
| path | URL path | /search |
| query | Query string | q=test |
| fragment | Hash/anchor | section |
| user | Username | admin |
| password | Password | secret |
| userinfo | user:pass | admin:secret |
| request_uri | Path + query | /search?q=test |
Most components have both reader and writer methods, making URLs mutable. This differs from languages like Java where you'd need to rebuild the URL.
Parsing Query Parameters
Ruby provides two ways to parse query strings: CGI.parse() returns a hash with array values (handling duplicates properly), while URI.decode_www_form() returns an array of pairs (preserving order).
require 'uri'
require 'cgi'
uri = URI.parse('https://shop.com/search?category=books&tag=ruby&tag=rails')
# Parse query string with CGI
params = CGI.parse(uri.query)
puts params
# {"category"=>["books"], "tag"=>["ruby", "rails"]}
# Get single value
category = params['category'].first
puts category # books
# Get multiple values
tags = params['tag']
puts tags # ["ruby", "rails"]
# Alternative: URI.decode_www_form (Ruby 1.9.2+)
params = URI.decode_www_form(uri.query)
puts params
# [["category", "books"], ["tag", "ruby"], ["tag", "rails"]]
# Convert to hash (loses duplicate keys)
params_hash = Hash[URI.decode_www_form(uri.query)]
puts params_hash
# {"category"=>"books", "tag"=>"rails"}
# Keep all values with group_by
params_grouped = URI.decode_www_form(uri.query)
.group_by { |k, _| k }
.transform_values { |pairs| pairs.map(&:last) }
puts params_grouped
# {"category"=>["books"], "tag"=>["ruby", "rails"]}CGI.parse() is often more convenient since you get a hash directly. The decode_www_form approach is useful when you need to preserve the exact order of parameters or process them sequentially.
Building URLs
Ruby offers several approaches to build URLs: the build() class method, encode_www_form() for query strings, and string interpolation for simple cases.
require 'uri'
# Using URI::Generic.build
uri = URI::HTTPS.build(
host: 'api.example.com',
path: '/v2/users',
query: 'page=1&limit=10'
)
puts uri # https://api.example.com/v2/users?page=1&limit=10
# Building query string with encode_www_form
params = [
['q', 'ruby tutorial'],
['page', '1'],
['sort', 'relevance']
]
query = URI.encode_www_form(params)
puts query # q=ruby+tutorial&page=1&sort=relevance
# From hash (simpler for unique keys)
params = {
q: 'ruby programming',
page: 1,
limit: 20
}
query = URI.encode_www_form(params)
puts query # q=ruby+programming&page=1&limit=20
# Complete URL
uri = URI::HTTPS.build(
host: 'api.example.com',
path: '/search',
query: URI.encode_www_form(q: 'test', page: 1)
)
puts uri # https://api.example.com/search?q=test&page=1The encode_www_form() method accepts both arrays and hashes, automatically encoding special characters. For duplicate keys, use an array of pairs instead of a hash.
Modifying URLs
URI objects in Ruby are mutable: you can assign new values to components directly. For query parameter manipulation, you'll typically parse, modify, and re-encode.
require 'uri'
uri = URI.parse('https://api.example.com/users?page=1')
# Modify components directly
uri.path = '/v2/users'
uri.query = 'page=2&limit=10'
uri.fragment = 'results'
puts uri # https://api.example.com/v2/users?page=2&limit=10#results
# Helper method to add/update query parameters
def add_query_params(uri, new_params)
uri = URI.parse(uri) if uri.is_a?(String)
existing = uri.query ? URI.decode_www_form(uri.query) : []
existing_hash = existing.to_h
merged = existing_hash.merge(new_params.transform_keys(&:to_s))
uri.query = URI.encode_www_form(merged)
uri
end
uri = URI.parse('https://api.com/data?existing=value')
uri = add_query_params(uri, { new: 'param', existing: 'updated' })
puts uri # https://api.com/data?existing=updated&new=param
# Remove query parameters
def remove_query_params(uri, keys_to_remove)
uri = URI.parse(uri) if uri.is_a?(String)
return uri unless uri.query
params = URI.decode_www_form(uri.query)
filtered = params.reject { |k, _| keys_to_remove.include?(k) }
uri.query = filtered.empty? ? nil : URI.encode_www_form(filtered)
uri
end
uri = URI.parse('https://api.com/data?a=1&b=2&c=3')
uri = remove_query_params(uri, ['b'])
puts uri # https://api.com/data?a=1&c=3These helper methods follow Ruby's philosophy of making common operations convenient. They accept both strings and URI objects, converting as needed. Notice how reject cleanly filters out unwanted parameters.
URL Encoding
Ruby provides encoding functions that follow the same space-encoding distinction as other languages: + for form data, %20 for general use.
require 'uri'
require 'cgi'
# URI.encode_www_form_component - for query values
encoded = URI.encode_www_form_component('hello world & friends')
puts encoded # hello+world+%26+friends
# CGI.escape - same behavior
encoded = CGI.escape('hello world')
puts encoded # hello+world
# URI::DEFAULT_PARSER.escape - for path segments (RFC 3986)
# Deprecated, use specific methods instead
# Encoding special characters
special = 'key=value&foo=bar'
puts URI.encode_www_form_component(special)
# key%3Dvalue%26foo%3Dbar
# Decoding
decoded = URI.decode_www_form_component('hello+world+%26+friends')
puts decoded # hello world & friends
decoded = CGI.unescape('hello+world')
puts decoded # hello world
# Complete example
base = 'https://api.example.com/search'
query = URI.encode_www_form(
q: 'ruby & rails',
filter: 'category:web'
)
puts "#{base}?#{query}"
# https://api.example.com/search?q=ruby+%26+rails&filter=category%3AwebThe encode_www_form_component() and CGI.escape() methods are largely equivalent. When using encode_www_form(), encoding happens automatically.
Resolving Relative URLs
Ruby uses the + operator to resolve relative URLs against a base, which is an elegant syntax unique to Ruby.
require 'uri'
base = URI.parse('https://example.com/docs/guide/')
# Join relative paths
relative = base + '../api/reference'
puts relative # https://example.com/docs/api/reference
page = base + 'page.html'
puts page # https://example.com/docs/guide/page.html
absolute = base + '/about'
puts absolute # https://example.com/about
# URI.join for multiple segments
full = URI.join('https://example.com', '/api/', 'v2/', 'users')
puts full # https://example.com/api/v2/users
# Resolve protocol-relative URLs
proto_relative = base + '//cdn.example.com/script.js'
puts proto_relative # https://cdn.example.com/script.jsThe + operator resolves the right operand against the left base URL. URI.join() can join multiple segments at once, which is useful for building API paths.
URL Validation
Ruby's URI.parse() raises URI::InvalidURIError for malformed URLs. Use rescue to handle these gracefully.
require 'uri'
def valid_http_url?(url)
uri = URI.parse(url)
uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)
rescue URI::InvalidURIError
false
end
def safe_redirect?(url, allowed_hosts)
uri = URI.parse(url)
return false unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)
host = uri.host&.downcase
return false unless host
allowed_hosts.any? do |allowed|
host == allowed || host.end_with?(".#{allowed}")
end
rescue URI::InvalidURIError
false
end
# Usage
puts valid_http_url?('https://example.com') # true
puts valid_http_url?('javascript:alert(1)') # false
puts valid_http_url?('ftp://files.com') # false
allowed = ['myapp.com', 'api.myapp.com']
puts safe_redirect?('https://myapp.com/callback', allowed) # true
puts safe_redirect?('https://evil.com/steal', allowed) # false
# Check for dangerous schemes
DANGEROUS_SCHEMES = %w[javascript data vbscript].freeze
def safe_scheme?(url)
uri = URI.parse(url)
!DANGEROUS_SCHEMES.include?(uri.scheme&.downcase)
rescue URI::InvalidURIError
false
endThe validation functions use Ruby's is_a? method to check scheme by object type: URI::HTTP or URI::HTTPS. This is cleaner than string comparison and catches invalid schemes automatically since URI.parse wouldn't return an HTTP/HTTPS object for other schemes.
Addressable Gem
For more advanced URL handling, the Addressable gem provides better Unicode support, URI templates (RFC 6570), and more forgiving parsing of malformed URLs.
# For more robust URL handling, consider the addressable gem
# gem install addressable
require 'addressable/uri'
# Parse URLs
uri = Addressable::URI.parse('https://example.com/path?q=hello world')
puts uri.query_values # {"q"=>"hello world"}
# Build with template expansion
template = Addressable::Template.new('https://api.com/users/{user_id}/posts{?page,limit}')
uri = template.expand(user_id: 123, page: 1, limit: 10)
puts uri # https://api.com/users/123/posts?page=1&limit=10
# Handle malformed URLs gracefully
uri = Addressable::URI.heuristic_parse('example.com/path')
puts uri # http://example.com/path
# Internationalized domain names (IDN)
uri = Addressable::URI.parse('https://münchen.example.com/')
puts uri.normalize.to_s # https://xn--mnchen-3ya.example.com/
# Query value modification
uri = Addressable::URI.parse('https://api.com/search?q=test')
uri.query_values = uri.query_values.merge('page' => '2')
puts uri # https://api.com/search?q=test&page=2Addressable's killer features are URI templates (expanding variables into URLs) and heuristic_parse (adding missing http:// automatically). The query_values accessor also makes parameter manipulation more convenient than the standard library.