URL Encoding vs. HTML Entities
The web is built on text, but distinct contexts have different "reserved characters". Two of the most common escaping mechanisms are URL Encoding and HTML Entities. Confusing them can lead to broken links, garbled text, or security vulnerabilities.
1. URL Encoding (Percent-Encoding)
URLs only support a limited set of ASCII characters.
- The Problem: If you want to send a search query for "Ben & Jerry's", you cannot put spaces or ampersands directly in the URL key/value pairs. The & denotes a new parameter, breaking your data.
- The Solution: Convert unsafe chars to % followed by their hex code.
- space -> %20
- & -> %26
- ? -> %3F
- In JS: Use encodeURIComponent().
const url = "search?q=" + encodeURIComponent("Ben & Jerry's");
2. HTML Entities
HTML uses generic brackets < > for tags.
- The Problem: If you want to display the text "This tag is <bold>" on a website, the browser will try to interpret <bold> as an actual tag and hide it.
- The Solution: Convert reserved chars to entities.
- < -> <
- > -> >
- " -> "
- & -> &
- In JS: The DOM handles this, or libraries like he or lodash.escape.
Security Implication: XSS
Cross-Site Scripting (XSS) happens when an attacker injects malicious JavaScript.
If you accept user input like <script>alert(1)</script> and render it directly to the page, it executes.
If you Entity Encode it first, it renders as <script>alert(1)</script>, which is safe text. One is code execution; the other is just text. The difference is encoding.