Home
Backend from First Principles / Module 39 — Internationalization (i18n) & Localization

Internationalization (i18n) & Localization

Building software that works in every language, region, and writing system — from day one.


i18n vs l10n — and Why You Should Care from Day One

Two terms get conflated:

Internationalization (i18n) — designing your software so it CAN be adapted to any language or region without code changes.

Localization (l10n) — actually translating and adapting it for a specific market.

The numbers come from the letter count: i + 18 letters + n; l + 10 letters + n.

The crucial point: i18n is an architecture decision. Localization is content work. Doing localization on top of un-internationalized software is enormously expensive — you have to rewrite the entire app. Doing i18n upfront is 1-2 days of work that saves weeks later.

Add i18n scaffolding at the start, even if you only ship in English. The cost is low; the option value is huge.


Unicode Everywhere — The Foundation

The single most important rule: every string, in every layer, is Unicode (UTF-8).

That means:
• Source code files — UTF-8
• HTTP requests and responses — UTF-8 (Content-Type: ...; charset=utf-8)
• Database — UTF-8 (PostgreSQL: UTF8 encoding; MySQL: utf8mb4 — NOT plain utf8 which is broken)
• File I/O — explicitly open with UTF-8
• String comparison — use Unicode-aware functions (your language's standard library has them)

What goes wrong without this:
• Names with accents (José, François) become Jos? or J��
• East Asian characters become squares
• Emoji break (because some pre-Unicode code paths assume 2 bytes per character)
• Searches don't match because "café" and "café" are encoded differently

If your stack is fully UTF-8, none of these happen. If even one layer isn't (a legacy database column, a poorly configured email gateway), data gets corrupted as it crosses that layer.

A common gotcha: MySQL's utf8 type is a historical mistake — it only handles 3-byte UTF-8 sequences, missing emoji and many CJK characters. Always use utf8mb4 instead. Newer MySQL versions are starting to fix this, but check your collation:

SQL
ALTER DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Locales — Language + Region

A locale identifies a language AND a region. Format follows IETF BCP 47:

Text
en-US    English, United States
en-GB    English, United Kingdom        ← different date format, currency, spelling
pt-BR    Portuguese, Brazil
pt-PT    Portuguese, Portugal           ← different vocabulary from pt-BR
zh-CN    Chinese, simplified (mainland)
zh-TW    Chinese, traditional (Taiwan)  ← different script
ar-SA    Arabic, Saudi Arabia           ← right-to-left

Why region matters:
en-US says "March 15, 2024" with prices in $1,234.56
en-GB says "15 March 2024" with prices in £1,234.56
de-DE says "15.03.2024" with prices as 1.234,56 €
de-CH (German, Switzerland) uses different number formatting than de-DE

A user's locale comes from:
1. A user preference if you store one (best — explicit choice)
2. The Accept-Language HTTP header sent by the browser
3. IP geolocation as a fallback

Always store the user's preferred locale. Never assume from IP alone — a German tourist in Japan still wants German.

Once you have a locale, every locale-dependent decision flows from it: date formats, number formats, sort order, plural rules, calendar choice (Gregorian vs Buddhist vs Islamic), text direction (LTR vs RTL).


Translation Catalogs — Strings as Data

The architectural rule: never hardcode user-facing strings in code.

Wrong:

JavaScript
return res.json({ error: "User not found" });

Right:

JavaScript
return res.json({ error: t('errors.user_not_found', { locale: req.locale }) });

Strings live in catalog files, organized by locale. The translation library (i18next, FormatJS, gettext, Rails I18n, Django's translation framework — every ecosystem has one) loads the right catalog for the current locale and looks up keys.

JavaScript
// locales/en.json
{
  "errors": {
    "user_not_found": "User not found",
    "invalid_password": "The password is incorrect"
  }
}

// locales/de.json
{
  "errors": {
    "user_not_found": "Benutzer nicht gefunden",
    "invalid_password": "Das Passwort ist falsch"
  }
}

Crucial discipline:
• Catalog keys describe meaning, not English. errors.user_not_found is good; errors.user_not_found_msg_en is not.
• Always provide context for translators. They need to know how a string is used. Most i18n libraries support a context note attached to each key.
• Don't concatenate strings. "Welcome, " + name + "!" breaks in languages where word order differs. Use placeholders: t('welcome', { name })"Welcome, {name}!" in en, "{name} さん、ようこそ" in ja.
• Don't reuse strings across contexts even if they look the same. "Open" as a button label vs as a status word may translate differently.


Plurals, Dates, Numbers, Currencies

Beyond translation, locale-aware FORMATTING is where most bugs hide.

Plurals — far harder than English suggests.

English has 2 forms: 1 item / 2 items. Polish has 4. Arabic has 6. Some languages (Chinese, Japanese, Korean) have 1.

Text
"You have N items"

en (1 item, 2 items)
de (1 Element, 2 Elemente)
ru (1 элемент, 2 элемента, 5 элементов, 1.5 элемента) ← four forms!
ar (six forms based on the count)
ja (no plural form — same word for any count)

Don't write count === 1 ? 'item' : 'items' anywhere. Use ICU MessageFormat or your library's plural support, which understands CLDR plural rules for every language:

Text
{count, plural,
  =0 {No items}
  one {# item}
  other {# items}
}

Dates — ALWAYS use a localized formatter, never string concatenation.

JavaScript
// JavaScript Intl API — built into the language
new Intl.DateTimeFormat('de-DE', { dateStyle: 'long' })
  .format(new Date());
// "15. März 2024"

new Intl.DateTimeFormat('ja-JP', { dateStyle: 'long' })
  .format(new Date());
// "2024年3月15日"

Numbers — separators differ. 1,234.56 (US) vs 1.234,56 (DE) vs 1 234,56 (FR).

JavaScript
new Intl.NumberFormat('de-DE').format(1234.56);
// "1.234,56"

Currencies — symbol AND placement vary. $1,234.56 (USD) vs 1.234,56 € (EUR).

JavaScript
new Intl.NumberFormat('de-DE', { style: 'currency', currency: 'EUR' })
  .format(1234.56);
// "1.234,56 €"

Lean on Intl.* (JS), babel.dates (Python), ICU (Java/C++), or your framework's built-ins. They all use Unicode CLDR data — the authoritative source for locale formatting that's updated for hundreds of locales.


RTL Languages & Other Curveballs

A few more things that surprise people who haven't internationalized before:

Right-to-left languages — Arabic, Hebrew, Persian, Urdu. Supporting these isn't just translating text; the entire UI flips horizontally:
• Text aligns right
• Layouts mirror (logo on right, navigation on right)
• Icons mirror (a "back" arrow points right in RTL)
• Dates and numbers stay left-to-right WITHIN an RTL paragraph

CSS direction: rtl and logical properties (margin-inline-start instead of margin-left) make this manageable, but you have to design with RTL in mind from the start.

Text expansion — Translations are often longer than English. German is famously verbose; "Settings" in English becomes "Einstellungen" in German. UI elements with fixed widths break. Plan for ~30-40% expansion.

Names and addresses — Don't assume:
• Names always have first and last (many cultures don't separate)
• Addresses fit US format (postcode after city, state codes, etc.)
• Phone numbers are 10 digits (use E.164: +14155551234)

Sorting — Alphabetical order is locale-dependent. In Spanish, "ñ" comes after "n". In Swedish, "Å" comes after "Z". Use Unicode collation (Intl.Collator in JS) instead of byte-level comparison.

Calendars — Most of the world uses Gregorian, but: Thailand uses the Buddhist calendar (year is +543), Japan has the imperial era, some apps need Hijri (Islamic) calendars. Most date libraries support these via locale.

Currencies aren't 1:1 with countries — the Euro is shared across many countries. The Swiss franc (CHF) uses different formatting in de-CH vs fr-CH. If you ship internationally, store both the amount AND the currency code (Module 32), never just the amount.

Local payment methods — beyond credit cards: PIX in Brazil, UPI in India, iDEAL in Netherlands, Alipay/WeChat Pay in China, Klarna in Europe. Stripe and similar providers abstract these, but designing your checkout to handle non-card flows from the start is much easier than retrofitting it later.

The takeaway: i18n is less about "supporting different languages" and more about "removing assumptions you've baked into the code based on your own locale." Every assumption you remove early is a bug you don't have to fix later.


⁂ Back to all modules