What's the difference between CGI.parse and URI.decode_www_form?

CGI.parse returns a hash where all values are arrays, handling duplicate keys properly. URI.decode_www_form returns an array of [key, value] pairs, which you can then process as needed. CGI.parse is often more convenient for typical query string handling.

When should I use the Addressable gem instead of stdlib URI?

Use Addressable for URI templates (RFC 6570), better handling of malformed URLs, internationalized domain names (IDN), or when you need more robust query string manipulation. For simple parsing tasks, stdlib URI is sufficient.

How do I handle URLs with international characters?

The stdlib URI module has limited IDN support. For full international URL support, use the Addressable gem which properly handles Unicode domain names (converting to Punycode) and percent-encodes Unicode in paths and query strings.

Parse URLs in Ruby - URI Module Complete Guide

Ruby's standard library includes the URI module for URL parsing and manipulation. It provides a clean, object-oriented interface that's idiomatic for Ruby developers.

Key Takeaways

1URI.parse returns scheme-specific URI objects
2CGI module handles query string encoding/decoding
3Addressable gem offers enhanced URL handling
4URI components can be modified directly via setters
5Use URI.encode_www_form for building query strings

Definition

Ruby URL Parsing

Ruby URL parsing uses the URI module from the standard library to parse, manipulate, and construct URLs. The URI.parse method returns scheme-specific URI objects that provide methods to access and modify all URL components.Source: Ruby Documentation - URI Module

“URI is a module providing classes to handle Uniform Resource Identifiers. It provides classes to handle all kinds of URIs and has convenience methods to parse strings into URI objects.”
— Ruby URI Documentation

Parsing URLs

Ruby's URI.parse() method returns scheme-specific objects like URI::HTTPS or URI::HTTP. This means you can check the type to validate schemes, and the objects know their default ports.

ruby

require 'uri'

url = 'https://user:pass@example.com:8080/path/page?query=value#fragment'
uri = URI.parse(url)

puts uri.scheme    # https
puts uri.host      # example.com
puts uri.port      # 8080
puts uri.path      # /path/page
puts uri.query     # query=value
puts uri.fragment  # fragment
puts uri.user      # user
puts uri.password  # pass

# Full components
puts uri.userinfo  # user:pass
puts uri.authority # user:pass@example.com:8080

# Request URI (path + query)
puts uri.request_uri  # /path/page?query=value

# Check type
puts uri.class  # URI::HTTPS

The code demonstrates accessing all URL components. Notice request_uri, which gives you the path plus query string: exactly what you'd send in an HTTP request line. The class check (URI::HTTPS) is a Ruby-idiomatic way to validate schemes.

URI Components

The URI object provides reader methods for each component. Some return nil if not present.

Method	Returns	Example
scheme	Protocol	https
host	Domain	example.com
port	Port number	8080
path	URL path	/search
query	Query string	q=test
fragment	Hash/anchor	section
user	Username	admin
password	Password	secret
userinfo	user:pass	admin:secret
request_uri	Path + query	/search?q=test

Most components have both reader and writer methods, making URLs mutable. This differs from languages like Java where you'd need to rebuild the URL.

Parsing Query Parameters

Ruby provides two ways to parse query strings: CGI.parse() returns a hash with array values (handling duplicates properly), while URI.decode_www_form() returns an array of pairs (preserving order).

ruby

require 'uri'
require 'cgi'

uri = URI.parse('https://shop.com/search?category=books&tag=ruby&tag=rails')

# Parse query string with CGI
params = CGI.parse(uri.query)
puts params
# {"category"=>["books"], "tag"=>["ruby", "rails"]}

# Get single value
category = params['category'].first
puts category  # books

# Get multiple values
tags = params['tag']
puts tags  # ["ruby", "rails"]

# Alternative: URI.decode_www_form (Ruby 1.9.2+)
params = URI.decode_www_form(uri.query)
puts params
# [["category", "books"], ["tag", "ruby"], ["tag", "rails"]]

# Convert to hash (loses duplicate keys)
params_hash = Hash[URI.decode_www_form(uri.query)]
puts params_hash
# {"category"=>"books", "tag"=>"rails"}

# Keep all values with group_by
params_grouped = URI.decode_www_form(uri.query)
  .group_by { |k, _| k }
  .transform_values { |pairs| pairs.map(&:last) }
puts params_grouped
# {"category"=>["books"], "tag"=>["ruby", "rails"]}

CGI.parse() is often more convenient since you get a hash directly. The decode_www_form approach is useful when you need to preserve the exact order of parameters or process them sequentially.

Building URLs

Ruby offers several approaches to build URLs: the build() class method, encode_www_form() for query strings, and string interpolation for simple cases.

ruby

require 'uri'

# Using URI::Generic.build
uri = URI::HTTPS.build(
  host: 'api.example.com',
  path: '/v2/users',
  query: 'page=1&limit=10'
)
puts uri  # https://api.example.com/v2/users?page=1&limit=10

# Building query string with encode_www_form
params = [
  ['q', 'ruby tutorial'],
  ['page', '1'],
  ['sort', 'relevance']
]
query = URI.encode_www_form(params)
puts query  # q=ruby+tutorial&page=1&sort=relevance

# From hash (simpler for unique keys)
params = {
  q: 'ruby programming',
  page: 1,
  limit: 20
}
query = URI.encode_www_form(params)
puts query  # q=ruby+programming&page=1&limit=20

# Complete URL
uri = URI::HTTPS.build(
  host: 'api.example.com',
  path: '/search',
  query: URI.encode_www_form(q: 'test', page: 1)
)
puts uri  # https://api.example.com/search?q=test&page=1

The encode_www_form() method accepts both arrays and hashes, automatically encoding special characters. For duplicate keys, use an array of pairs instead of a hash.

Modifying URLs

URI objects in Ruby are mutable: you can assign new values to components directly. For query parameter manipulation, you'll typically parse, modify, and re-encode.

ruby

require 'uri'

uri = URI.parse('https://api.example.com/users?page=1')

# Modify components directly
uri.path = '/v2/users'
uri.query = 'page=2&limit=10'
uri.fragment = 'results'

puts uri  # https://api.example.com/v2/users?page=2&limit=10#results

# Helper method to add/update query parameters
def add_query_params(uri, new_params)
  uri = URI.parse(uri) if uri.is_a?(String)

  existing = uri.query ? URI.decode_www_form(uri.query) : []
  existing_hash = existing.to_h

  merged = existing_hash.merge(new_params.transform_keys(&:to_s))
  uri.query = URI.encode_www_form(merged)

  uri
end

uri = URI.parse('https://api.com/data?existing=value')
uri = add_query_params(uri, { new: 'param', existing: 'updated' })
puts uri  # https://api.com/data?existing=updated&new=param

# Remove query parameters
def remove_query_params(uri, keys_to_remove)
  uri = URI.parse(uri) if uri.is_a?(String)

  return uri unless uri.query

  params = URI.decode_www_form(uri.query)
  filtered = params.reject { |k, _| keys_to_remove.include?(k) }
  uri.query = filtered.empty? ? nil : URI.encode_www_form(filtered)

  uri
end

uri = URI.parse('https://api.com/data?a=1&b=2&c=3')
uri = remove_query_params(uri, ['b'])
puts uri  # https://api.com/data?a=1&c=3

These helper methods follow Ruby's philosophy of making common operations convenient. They accept both strings and URI objects, converting as needed. Notice how reject cleanly filters out unwanted parameters.

URL Encoding

Ruby provides encoding functions that follow the same space-encoding distinction as other languages: + for form data, %20 for general use.

ruby

require 'uri'
require 'cgi'

# URI.encode_www_form_component - for query values
encoded = URI.encode_www_form_component('hello world & friends')
puts encoded  # hello+world+%26+friends

# CGI.escape - same behavior
encoded = CGI.escape('hello world')
puts encoded  # hello+world

# URI::DEFAULT_PARSER.escape - for path segments (RFC 3986)
# Deprecated, use specific methods instead

# Encoding special characters
special = 'key=value&foo=bar'
puts URI.encode_www_form_component(special)
# key%3Dvalue%26foo%3Dbar

# Decoding
decoded = URI.decode_www_form_component('hello+world+%26+friends')
puts decoded  # hello world & friends

decoded = CGI.unescape('hello+world')
puts decoded  # hello world

# Complete example
base = 'https://api.example.com/search'
query = URI.encode_www_form(
  q: 'ruby & rails',
  filter: 'category:web'
)
puts "#{base}?#{query}"
# https://api.example.com/search?q=ruby+%26+rails&filter=category%3Aweb

The encode_www_form_component() and CGI.escape() methods are largely equivalent. When using encode_www_form(), encoding happens automatically.

Resolving Relative URLs

Ruby uses the + operator to resolve relative URLs against a base, which is an elegant syntax unique to Ruby.

ruby

require 'uri'

base = URI.parse('https://example.com/docs/guide/')

# Join relative paths
relative = base + '../api/reference'
puts relative  # https://example.com/docs/api/reference

page = base + 'page.html'
puts page  # https://example.com/docs/guide/page.html

absolute = base + '/about'
puts absolute  # https://example.com/about

# URI.join for multiple segments
full = URI.join('https://example.com', '/api/', 'v2/', 'users')
puts full  # https://example.com/api/v2/users

# Resolve protocol-relative URLs
proto_relative = base + '//cdn.example.com/script.js'
puts proto_relative  # https://cdn.example.com/script.js

The + operator resolves the right operand against the left base URL. URI.join() can join multiple segments at once, which is useful for building API paths.

URL Validation

Ruby's URI.parse() raises URI::InvalidURIError for malformed URLs. Use rescue to handle these gracefully.

ruby

require 'uri'

def valid_http_url?(url)
  uri = URI.parse(url)
  uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)
rescue URI::InvalidURIError
  false
end

def safe_redirect?(url, allowed_hosts)
  uri = URI.parse(url)
  return false unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)

  host = uri.host&.downcase
  return false unless host

  allowed_hosts.any? do |allowed|
    host == allowed || host.end_with?(".#{allowed}")
  end
rescue URI::InvalidURIError
  false
end

# Usage
puts valid_http_url?('https://example.com')  # true
puts valid_http_url?('javascript:alert(1)')  # false
puts valid_http_url?('ftp://files.com')      # false

allowed = ['myapp.com', 'api.myapp.com']
puts safe_redirect?('https://myapp.com/callback', allowed)  # true
puts safe_redirect?('https://evil.com/steal', allowed)      # false

# Check for dangerous schemes
DANGEROUS_SCHEMES = %w[javascript data vbscript].freeze

def safe_scheme?(url)
  uri = URI.parse(url)
  !DANGEROUS_SCHEMES.include?(uri.scheme&.downcase)
rescue URI::InvalidURIError
  false
end

The validation functions use Ruby's is_a? method to check scheme by object type: URI::HTTP or URI::HTTPS. This is cleaner than string comparison and catches invalid schemes automatically since URI.parse wouldn't return an HTTP/HTTPS object for other schemes.

Addressable Gem

For more advanced URL handling, the Addressable gem provides better Unicode support, URI templates (RFC 6570), and more forgiving parsing of malformed URLs.

ruby

# For more robust URL handling, consider the addressable gem
# gem install addressable

require 'addressable/uri'

# Parse URLs
uri = Addressable::URI.parse('https://example.com/path?q=hello world')
puts uri.query_values  # {"q"=>"hello world"}

# Build with template expansion
template = Addressable::Template.new('https://api.com/users/{user_id}/posts{?page,limit}')
uri = template.expand(user_id: 123, page: 1, limit: 10)
puts uri  # https://api.com/users/123/posts?page=1&limit=10

# Handle malformed URLs gracefully
uri = Addressable::URI.heuristic_parse('example.com/path')
puts uri  # http://example.com/path

# Internationalized domain names (IDN)
uri = Addressable::URI.parse('https://münchen.example.com/')
puts uri.normalize.to_s  # https://xn--mnchen-3ya.example.com/

# Query value modification
uri = Addressable::URI.parse('https://api.com/search?q=test')
uri.query_values = uri.query_values.merge('page' => '2')
puts uri  # https://api.com/search?q=test&page=2

Addressable's killer features are URI templates (expanding variables into URLs) and heuristic_parse (adding missing http:// automatically). The query_values accessor also makes parameter manipulation more convenient than the standard library.

Parse URLs in Ruby

Key Takeaways

Parsing URLs

URI Components

Parsing Query Parameters

Building URLs

Modifying URLs

URL Encoding

Resolving Relative URLs

URL Validation

Addressable Gem

Frequently Asked Questions

Related Guides

URL Anatomy

Query Parameters

Try it yourself