URL Encoding vs HTML Entity Encoding: The Developer's Guide

URL encoding and HTML entity encoding get confused all the time — I've reviewed production code that used them interchangeably. They're not interchangeable. They solve completely different problems in completely different contexts, and mixing them up is how you get broken links, mangled query strings, and XSS vulnerabilities in the same codebase. The distinction is simple once you understand the context each one belongs to.

URL Encoding: what it is and when it runs

URL encoding — formally "percent encoding," defined in RFC 3986 — converts characters that are unsafe or reserved in a URL into a percent sign followed by two hex digits representing the character's UTF-8 byte value. A space becomes %20, an ampersand becomes %26, and the copyright symbol (©) becomes %C2%A9.

URLs have a defined safe character set. "Unreserved" characters — letters, digits, hyphen, underscore, period, tilde — can appear as-is. "Reserved" characters like : / ? & = # have special structural meaning. Everything else, including spaces and non-ASCII characters, must be percent-encoded.

text

Examples of URL encoding:
Space       → %20
Ampersand   → %26
Hash        → %23
At sign     → %40
Plus sign   → %2B
Forward slash (in query value) → %2F
Café (UTF-8) → caf%C3%A9

encodeURI vs encodeURIComponent in JavaScript

JavaScript provides two URL encoding functions, and picking the wrong one is a very common bug. The rule is actually simple once you know it:

encodeURI() encodes a complete URL. It doesn't encode characters with structural meaning: : / ? & = # @ and unreserved characters. Use this when you've got a fully-formed URL and want to make it safe for HTML attributes.

encodeURIComponent() encodes a single URL component — a query parameter name or value, a path segment. It does encode reserved characters like & and = because those would break the URL structure inside a query value. When you're building query strings from dynamic data, always use encodeURIComponent(). That's the one you'll need 90% of the time.

javascript

const base = "https://example.com/search";
const query = "cats & dogs";

// WRONG — & is not encoded, breaks the query string
const bad = base + "?q=" + query;
// → https://example.com/search?q=cats & dogs

// CORRECT — encodeURIComponent encodes the & and space
const good = base + "?q=" + encodeURIComponent(query);
// → https://example.com/search?q=cats%20%26%20dogs

// encodeURI — safe for complete URLs (does not encode &, ?, =)
const safeUrl = encodeURI("https://example.com/path with spaces");
// → https://example.com/path%20with%20spaces

HTML Entity Encoding

HTML entities are codes that represent characters within HTML markup. They're needed for two reasons: to display characters that have special meaning in HTML (< > & " '), and to include characters that fall outside ASCII without relying on encoding assumptions.

HTML entities begin with & and end with ; and take two forms: named entities like & (for &), < (for <), > (for >), and © (for ©); and numeric character references like < (decimal) or < (hexadecimal), both representing <.

& — ampersand (&) — the one you'll miss most often
< — less-than sign (<)
> — greater-than sign (>)
" — double quote (")
' — apostrophe (')
  — non-breaking space

Why getting this wrong causes XSS vulnerabilities

If user-submitted text containing < or > gets inserted directly into HTML without encoding, the browser interprets those characters as HTML tags. An attacker injects a <script> tag with malicious JavaScript — that's Cross-Site Scripting (XSS), which sits at the top of the OWASP Top 10 for a reason. It's common, it's devastating, and it's entirely preventable.

javascript

// VULNERABLE — inserting raw user input into the DOM
const userInput = '<script>alert("XSS attack")</script>';
document.getElementById("output").innerHTML = userInput;
// The script tag executes! Attacker can steal cookies, tokens, etc.

// SAFE — HTML-encode before insertion
function escapeHtml(str) {
  return str
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&#39;');
}

document.getElementById("output").innerHTML = escapeHtml(userInput);
// Renders as visible text: <script>alert("XSS attack")</script>

React, Vue, and Angular all auto-encode content inserted via JSX/templates — this is one of the genuinely good default behaviors modern frameworks give you for free. But the moment you use dangerouslySetInnerHTML (React), v-html (Vue), or innerHTML (vanilla JS), you've opted out of that protection. You're on your own. Encode manually.

The decision rule, in one sentence

URL encoding: when building query strings, constructing URL path segments from dynamic data, or placing a URL into an href attribute
HTML entity encoding: when rendering any user-provided text into HTML, showing code examples on a page, or inserting dynamic content into HTML attributes
Both: when embedding a URL with dynamic parameters inside an HTML attribute — URL-encode the query values first, then HTML-encode the full URL string

The Double-Encoding Pitfall

A common bug occurs when the same encoding is applied twice — either accidentally by two layers of a framework, or when a developer manually encodes something that the framework also encodes. The result is visible encoding artifacts: an ampersand shows up as &amp; instead of & or a URL contains %2520 (the % itself was percent-encoded) instead of %20.

Know which layer of your stack handles encoding. If your template engine auto-escapes output, don't also manually call escapeHtml() — or explicitly opt out of auto-escaping when you've already encoded the value yourself.