Internationalization (i18n) & Localization
Building software that works in every language, region, and writing system — from day one.
i18n vs l10n — and Why You Should Care from Day One
Two terms get conflated:
Internationalization (i18n) — designing your software so it CAN be adapted to any language or region without code changes.
Localization (l10n) — actually translating and adapting it for a specific market.
The numbers come from the letter count: i + 18 letters + n; l + 10 letters + n.
The crucial point: i18n is an architecture decision. Localization is content work. Doing localization on top of un-internationalized software is enormously expensive — you have to rewrite the entire app. Doing i18n upfront is 1-2 days of work that saves weeks later.
Add i18n scaffolding at the start, even if you only ship in English. The cost is low; the option value is huge.
Unicode Everywhere — The Foundation
The single most important rule: every string, in every layer, is Unicode (UTF-8).
That means:
• Source code files — UTF-8
• HTTP requests and responses — UTF-8 (Content-Type: ...; charset=utf-8)
• Database — UTF-8 (PostgreSQL: UTF8 encoding; MySQL: utf8mb4 — NOT plain utf8 which is broken)
• File I/O — explicitly open with UTF-8
• String comparison — use Unicode-aware functions (your language's standard library has them)
What goes wrong without this:
• Names with accents (José, François) become Jos? or J��
• East Asian characters become squares
• Emoji break (because some pre-Unicode code paths assume 2 bytes per character)
• Searches don't match because "café" and "café" are encoded differently
If your stack is fully UTF-8, none of these happen. If even one layer isn't (a legacy database column, a poorly configured email gateway), data gets corrupted as it crosses that layer.
A common gotcha: MySQL's utf8 type is a historical mistake — it only handles 3-byte UTF-8 sequences, missing emoji and many CJK characters. Always use utf8mb4 instead. Newer MySQL versions are starting to fix this, but check your collation:
ALTER DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Locales — Language + Region
A locale identifies a language AND a region. Format follows IETF BCP 47:
en-US English, United States
en-GB English, United Kingdom ← different date format, currency, spelling
pt-BR Portuguese, Brazil
pt-PT Portuguese, Portugal ← different vocabulary from pt-BR
zh-CN Chinese, simplified (mainland)
zh-TW Chinese, traditional (Taiwan) ← different script
ar-SA Arabic, Saudi Arabia ← right-to-left
Why region matters:
• en-US says "March 15, 2024" with prices in $1,234.56
• en-GB says "15 March 2024" with prices in £1,234.56
• de-DE says "15.03.2024" with prices as 1.234,56 €
• de-CH (German, Switzerland) uses different number formatting than de-DE
A user's locale comes from:
1. A user preference if you store one (best — explicit choice)
2. The Accept-Language HTTP header sent by the browser
3. IP geolocation as a fallback
Always store the user's preferred locale. Never assume from IP alone — a German tourist in Japan still wants German.
Once you have a locale, every locale-dependent decision flows from it: date formats, number formats, sort order, plural rules, calendar choice (Gregorian vs Buddhist vs Islamic), text direction (LTR vs RTL).
Translation Catalogs — Strings as Data
The architectural rule: never hardcode user-facing strings in code.
Wrong:
return res.json({ error: "User not found" });
Right:
return res.json({ error: t('errors.user_not_found', { locale: req.locale }) });
Strings live in catalog files, organized by locale. The translation library (i18next, FormatJS, gettext, Rails I18n, Django's translation framework — every ecosystem has one) loads the right catalog for the current locale and looks up keys.
// locales/en.json
{
"errors": {
"user_not_found": "User not found",
"invalid_password": "The password is incorrect"
}
}
// locales/de.json
{
"errors": {
"user_not_found": "Benutzer nicht gefunden",
"invalid_password": "Das Passwort ist falsch"
}
}
Crucial discipline:
• Catalog keys describe meaning, not English. errors.user_not_found is good; errors.user_not_found_msg_en is not.
• Always provide context for translators. They need to know how a string is used. Most i18n libraries support a context note attached to each key.
• Don't concatenate strings. "Welcome, " + name + "!" breaks in languages where word order differs. Use placeholders: t('welcome', { name }) → "Welcome, {name}!" in en, "{name} さん、ようこそ" in ja.
• Don't reuse strings across contexts even if they look the same. "Open" as a button label vs as a status word may translate differently.
Plurals, Dates, Numbers, Currencies
Beyond translation, locale-aware FORMATTING is where most bugs hide.
Plurals — far harder than English suggests.
English has 2 forms: 1 item / 2 items. Polish has 4. Arabic has 6. Some languages (Chinese, Japanese, Korean) have 1.
"You have N items"
en (1 item, 2 items)
de (1 Element, 2 Elemente)
ru (1 элемент, 2 элемента, 5 элементов, 1.5 элемента) ← four forms!
ar (six forms based on the count)
ja (no plural form — same word for any count)
Don't write count === 1 ? 'item' : 'items' anywhere. Use ICU MessageFormat or your library's plural support, which understands CLDR plural rules for every language:
{count, plural,
=0 {No items}
one {# item}
other {# items}
}
Dates — ALWAYS use a localized formatter, never string concatenation.
// JavaScript Intl API — built into the language
new Intl.DateTimeFormat('de-DE', { dateStyle: 'long' })
.format(new Date());
// "15. März 2024"
new Intl.DateTimeFormat('ja-JP', { dateStyle: 'long' })
.format(new Date());
// "2024年3月15日"
Numbers — separators differ. 1,234.56 (US) vs 1.234,56 (DE) vs 1 234,56 (FR).
new Intl.NumberFormat('de-DE').format(1234.56);
// "1.234,56"
Currencies — symbol AND placement vary. $1,234.56 (USD) vs 1.234,56 € (EUR).
new Intl.NumberFormat('de-DE', { style: 'currency', currency: 'EUR' })
.format(1234.56);
// "1.234,56 €"
Lean on Intl.* (JS), babel.dates (Python), ICU (Java/C++), or your framework's built-ins. They all use Unicode CLDR data — the authoritative source for locale formatting that's updated for hundreds of locales.
RTL Languages & Other Curveballs
A few more things that surprise people who haven't internationalized before:
Right-to-left languages — Arabic, Hebrew, Persian, Urdu. Supporting these isn't just translating text; the entire UI flips horizontally:
• Text aligns right
• Layouts mirror (logo on right, navigation on right)
• Icons mirror (a "back" arrow points right in RTL)
• Dates and numbers stay left-to-right WITHIN an RTL paragraph
CSS direction: rtl and logical properties (margin-inline-start instead of margin-left) make this manageable, but you have to design with RTL in mind from the start.
Text expansion — Translations are often longer than English. German is famously verbose; "Settings" in English becomes "Einstellungen" in German. UI elements with fixed widths break. Plan for ~30-40% expansion.
Names and addresses — Don't assume:
• Names always have first and last (many cultures don't separate)
• Addresses fit US format (postcode after city, state codes, etc.)
• Phone numbers are 10 digits (use E.164: +14155551234)
Sorting — Alphabetical order is locale-dependent. In Spanish, "ñ" comes after "n". In Swedish, "Å" comes after "Z". Use Unicode collation (Intl.Collator in JS) instead of byte-level comparison.
Calendars — Most of the world uses Gregorian, but: Thailand uses the Buddhist calendar (year is +543), Japan has the imperial era, some apps need Hijri (Islamic) calendars. Most date libraries support these via locale.
Currencies aren't 1:1 with countries — the Euro is shared across many countries. The Swiss franc (CHF) uses different formatting in de-CH vs fr-CH. If you ship internationally, store both the amount AND the currency code (Module 32), never just the amount.
Local payment methods — beyond credit cards: PIX in Brazil, UPI in India, iDEAL in Netherlands, Alipay/WeChat Pay in China, Klarna in Europe. Stripe and similar providers abstract these, but designing your checkout to handle non-card flows from the start is much easier than retrofitting it later.
The takeaway: i18n is less about "supporting different languages" and more about "removing assumptions you've baked into the code based on your own locale." Every assumption you remove early is a bug you don't have to fix later.
⁂ Back to all modules