DailyTools
All articles
Web DevelopmentJanuary 28, 20257 min read

URL Encoding vs HTML Entity Encoding: The Developer's Guide

Two of the most commonly confused encoding operations in web development serve completely different purposes. Learn when to use each, how they work, and the security implications of getting it wrong.

URL encoding and HTML entity encoding are two of the most common text encoding operations in web development. They are frequently confused — sometimes even used interchangeably — but they serve entirely different purposes and operate in different contexts. Using the wrong one is a common source of display bugs, broken links, and security vulnerabilities.

URL Encoding (Percent Encoding)

URL encoding, formally called "percent encoding," converts characters that are unsafe or reserved in a URL into a percent sign followed by two hexadecimal digits representing the character's UTF-8 byte value. A space becomes %20, an ampersand becomes %26, and the copyright symbol (©) becomes %C2%A9.

URLs have a defined safe character set. "Unreserved" characters — letters, digits, hyphen, underscore, period, tilde — can appear in URLs as-is. "Reserved" characters like : / ? & = # have special structural meaning within the URL. Everything else, including spaces and non-ASCII characters, must be percent-encoded.

text
Examples of URL encoding:
Space       → %20
Ampersand   → %26
Hash        → %23
At sign     → %40
Plus sign   → %2B
Forward slash (in query value) → %2F
Café (UTF-8) → caf%C3%A9

encodeURI vs encodeURIComponent in JavaScript

JavaScript provides two URL encoding functions, and choosing the wrong one is a frequent bug. The distinction matters a lot:

encodeURI() encodes a complete URL. It does NOT encode characters that have structural meaning in a URL: : / ? & = # @ and the unreserved characters. Use this when you have a fully-formed URL and want to make it safe for use in HTML attributes.

encodeURIComponent() encodes a single URL component — a query parameter name or value, a path segment. It DOES encode reserved characters like & and = because those would break the URL structure if they appeared unencoded inside a query value. Always use encodeURIComponent() when constructing query strings from dynamic data.

javascript
const base = "https://example.com/search";
const query = "cats & dogs";

// WRONG — & is not encoded, breaks the query string
const bad = base + "?q=" + query;
// → https://example.com/search?q=cats & dogs

// CORRECT — encodeURIComponent encodes the & and space
const good = base + "?q=" + encodeURIComponent(query);
// → https://example.com/search?q=cats%20%26%20dogs

// encodeURI — safe for complete URLs (does not encode &, ?, =)
const safeUrl = encodeURI("https://example.com/path with spaces");
// → https://example.com/path%20with%20spaces

HTML Entity Encoding

HTML entities are codes that represent characters within HTML markup. They are needed for two reasons: to display characters that have special meaning in HTML (< > & " '), and to include characters that may be difficult to type or that fall outside ASCII.

HTML entities begin with & and end with ; and take two forms: named entities like &amp; (for &), &lt; (for <), &gt; (for >), and &copy; (for ©); and numeric character references like &#60; (decimal) or &#x3C; (hexadecimal), both representing <.

  • &amp; — ampersand (&)
  • &lt; — less-than sign (<)
  • &gt; — greater-than sign (>)
  • &quot; — double quote (")
  • &apos; — apostrophe (')
  • &nbsp; — non-breaking space
  • &copy; — copyright symbol (©)
  • &trade; — trademark symbol (™)

Why HTML Encoding Matters for Security: XSS Prevention

If user-submitted text containing < or > is inserted directly into HTML without encoding, the browser interprets those characters as HTML tags. An attacker can inject a <script> tag with malicious JavaScript — this is Cross-Site Scripting (XSS), consistently in the OWASP Top 10 most critical web security vulnerabilities.

javascript
// VULNERABLE — inserting raw user input into the DOM
const userInput = '<script>alert("XSS attack")</script>';
document.getElementById("output").innerHTML = userInput;
// The script tag executes! Attacker can steal cookies, tokens, etc.

// SAFE — HTML-encode before insertion
function escapeHtml(str) {
  return str
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&#39;');
}

document.getElementById("output").innerHTML = escapeHtml(userInput);
// Renders as visible text: <script>alert("XSS attack")</script>

Modern frameworks like React, Vue, and Angular automatically HTML-encode content inserted via JSX/templates. However, any time you use dangerouslySetInnerHTML (React) or v-html (Vue) or innerHTML (vanilla JS), you are bypassing this protection and must encode manually.

When to Use Each

  • Use URL encoding when building query strings, constructing URL path segments from dynamic data, or placing a URL into an href before HTML encoding it
  • Use HTML entity encoding when rendering any user-provided text into HTML, displaying code examples in a web page, or inserting dynamic content into HTML attributes
  • Use both when embedding a URL with dynamic parameters inside an HTML attribute — URL-encode the query values first, then HTML-encode the entire URL string

The Double-Encoding Pitfall

A common bug occurs when the same encoding is applied twice — either accidentally by two layers of a framework, or when a developer manually encodes something that the framework also encodes. The result is visible encoding artifacts: an ampersand shows up as &amp;amp; instead of & or a URL contains %2520 (the % itself was percent-encoded) instead of %20.

Always know which layer of your stack handles encoding. If your template engine auto-escapes output, do not also manually call escapeHtml() — or explicitly opt out of auto-escaping when you have already encoded the value yourself.

Try the free tool referenced in this article

URL Encoder / Decoder