URL encoding and HTML entity encoding are two of the most common text encoding operations in web development. They are frequently confused — sometimes even used interchangeably — but they serve entirely different purposes and operate in different contexts. Using the wrong one is a common source of display bugs, broken links, and security vulnerabilities.
URL Encoding (Percent Encoding)
URL encoding, formally called "percent encoding," converts characters that are unsafe or reserved in a URL into a percent sign followed by two hexadecimal digits representing the character's UTF-8 byte value. A space becomes %20, an ampersand becomes %26, and the copyright symbol (©) becomes %C2%A9.
URLs have a defined safe character set. "Unreserved" characters — letters, digits, hyphen, underscore, period, tilde — can appear in URLs as-is. "Reserved" characters like : / ? & = # have special structural meaning within the URL. Everything else, including spaces and non-ASCII characters, must be percent-encoded.
Examples of URL encoding:
Space → %20
Ampersand → %26
Hash → %23
At sign → %40
Plus sign → %2B
Forward slash (in query value) → %2F
Café (UTF-8) → caf%C3%A9encodeURI vs encodeURIComponent in JavaScript
JavaScript provides two URL encoding functions, and choosing the wrong one is a frequent bug. The distinction matters a lot:
encodeURI() encodes a complete URL. It does NOT encode characters that have structural meaning in a URL: : / ? & = # @ and the unreserved characters. Use this when you have a fully-formed URL and want to make it safe for use in HTML attributes.
encodeURIComponent() encodes a single URL component — a query parameter name or value, a path segment. It DOES encode reserved characters like & and = because those would break the URL structure if they appeared unencoded inside a query value. Always use encodeURIComponent() when constructing query strings from dynamic data.
const base = "https://example.com/search";
const query = "cats & dogs";
// WRONG — & is not encoded, breaks the query string
const bad = base + "?q=" + query;
// → https://example.com/search?q=cats & dogs
// CORRECT — encodeURIComponent encodes the & and space
const good = base + "?q=" + encodeURIComponent(query);
// → https://example.com/search?q=cats%20%26%20dogs
// encodeURI — safe for complete URLs (does not encode &, ?, =)
const safeUrl = encodeURI("https://example.com/path with spaces");
// → https://example.com/path%20with%20spacesHTML Entity Encoding
HTML entities are codes that represent characters within HTML markup. They are needed for two reasons: to display characters that have special meaning in HTML (< > & " '), and to include characters that may be difficult to type or that fall outside ASCII.
HTML entities begin with & and end with ; and take two forms: named entities like & (for &), < (for <), > (for >), and © (for ©); and numeric character references like < (decimal) or < (hexadecimal), both representing <.
- & — ampersand (&)
- < — less-than sign (<)
- > — greater-than sign (>)
- " — double quote (")
- ' — apostrophe (')
- — non-breaking space
- © — copyright symbol (©)
- ™ — trademark symbol (™)
Why HTML Encoding Matters for Security: XSS Prevention
If user-submitted text containing < or > is inserted directly into HTML without encoding, the browser interprets those characters as HTML tags. An attacker can inject a <script> tag with malicious JavaScript — this is Cross-Site Scripting (XSS), consistently in the OWASP Top 10 most critical web security vulnerabilities.
// VULNERABLE — inserting raw user input into the DOM
const userInput = '<script>alert("XSS attack")</script>';
document.getElementById("output").innerHTML = userInput;
// The script tag executes! Attacker can steal cookies, tokens, etc.
// SAFE — HTML-encode before insertion
function escapeHtml(str) {
return str
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
document.getElementById("output").innerHTML = escapeHtml(userInput);
// Renders as visible text: <script>alert("XSS attack")</script>Modern frameworks like React, Vue, and Angular automatically HTML-encode content inserted via JSX/templates. However, any time you use dangerouslySetInnerHTML (React) or v-html (Vue) or innerHTML (vanilla JS), you are bypassing this protection and must encode manually.
When to Use Each
- Use URL encoding when building query strings, constructing URL path segments from dynamic data, or placing a URL into an href before HTML encoding it
- Use HTML entity encoding when rendering any user-provided text into HTML, displaying code examples in a web page, or inserting dynamic content into HTML attributes
- Use both when embedding a URL with dynamic parameters inside an HTML attribute — URL-encode the query values first, then HTML-encode the entire URL string
The Double-Encoding Pitfall
A common bug occurs when the same encoding is applied twice — either accidentally by two layers of a framework, or when a developer manually encodes something that the framework also encodes. The result is visible encoding artifacts: an ampersand shows up as &amp; instead of & or a URL contains %2520 (the % itself was percent-encoded) instead of %20.
Always know which layer of your stack handles encoding. If your template engine auto-escapes output, do not also manually call escapeHtml() — or explicitly opt out of auto-escaping when you have already encoded the value yourself.