Home
Java from First Principles / Chapter 4 — Strings Deep Dive

Strings Deep Dive

Why == lies. The String Pool. StringBuilder vs StringBuffer vs string concatenation in a loop. The 1000-object trap.


The most-used class in Java

Almost every Java program manipulates strings. Names, IDs, URLs, log messages, JSON payloads, SQL queries — all strings. Yet String is also the source of more subtle bugs than almost any other class. The == vs .equals() confusion. The "Strings are immutable, so concatenation in a loop is slow" trap. The interview question that asks "what does == return for two strings with the same content?" and watches you sweat.

This chapter is the deep dive on Strings. Why they're immutable. How the String Pool actually works. When to use StringBuilder vs StringBuffer vs plain +. The text blocks added in modern Java. The performance gotchas that bite at scale.

It's a longer chapter than you might expect for "just strings." That's because String behaviour underpins so much of Java's correctness story that getting it right matters more than getting almost any other class right.


Strings are immutable

A String in Java cannot be changed after it's created. Every method that "modifies" a string actually returns a new String:

Java
String s = "hello";
s.toUpperCase();             // does nothing visible — return value discarded
System.out.println(s);       // "hello"

s = s.toUpperCase();          // assigns the new String to s
System.out.println(s);       // "HELLO"

The original "hello" String object never changed. toUpperCase() created a new String containing "HELLO" and returned it. If you don't capture the return value, the new String is immediately garbage.

Why immutable? Several real reasons:

**Thread safety.** Multiple threads can share a String reference without any synchronization. There's nothing to mutate, so there's no race condition.

**Caching.** Strings can be safely interned (deduplicated) because we know none of them will ever change.

**Hash code caching.** String.hashCode() is cached after the first call. This makes Strings ideal as HashMap keys — they're cheap to hash repeatedly.

**Security.** A String passed to a method (a file path, a URL, a SQL query) can't be tampered with by the receiving code. The caller knows the value won't change behind their back.

The cost is that "modifying" a String creates a new one. For a single call this is invisible. For a million calls in a tight loop, it matters — which is why StringBuilder exists.


The String Pool

When you write a String literal in Java source code, the JVM stores it in a special area of the heap called the **String Pool** (sometimes called the string intern pool).

String Pool: how Java deduplicates string literals Code String a = "Alice"; String b = "Alice"; String c = new String("Alice"); a == b true a == c false a.equals(c) true Heap String Pool "Alice" 0xA1 String "Alice" 0xB2 a → b → c →
String literals like "Alice" are interned — written once into the pool, reused by every variable that asks for the same text. new String("Alice") bypasses the pool and creates a fresh object on the heap. That's why a == b is true but a == c is false.

The rule: every unique literal exists exactly *once* in the pool. When you write "Alice" in your source code, the JVM checks the pool. If "Alice" is already there, it returns a reference to the existing pooled String. If not, it adds it.

Java
String a = "Alice";
String b = "Alice";
System.out.println(a == b);   // true — both reference the same pooled String

a == b is true because both variables hold the same address. The literal "Alice" was interned the first time it appeared, and reused the second time.

But:

Java
String a = "Alice";
String c = new String("Alice");
System.out.println(a == c);   // false
System.out.println(a.equals(c));  // true

new String("Alice") explicitly forces creation of a fresh String object on the heap, bypassing the pool. The two references now point to different objects with the same content. == (reference comparison) returns false. .equals() (content comparison) returns true.

There's also .intern(), which manually adds a string to the pool (or returns the pooled reference if it's already there):

Java
String c = new String("Alice");
String d = c.intern();           // now d is the pool's "Alice"
System.out.println(a == d);      // true

You rarely call .intern() explicitly. It's used internally and in very specific deduplication scenarios.


== vs .equals() — the rule that prevents 90% of String bugs

**Always use .equals() to compare String content. Never ==.**

Java
String input = readUserInput();    // returns a fresh String
if (input == "admin") {            // BAD — fragile, depends on pooling
    grantAdmin();
}
if ("admin".equals(input)) {        // GOOD — content comparison, null-safe
    grantAdmin();
}

Why .equals() from the literal side? Because if input is null, input.equals("admin") throws NPE, but "admin".equals(input) returns false cleanly. The literal can never be null, so calling .equals() on it is always safe. This pattern is called "Yoda equality" — verbose but null-safe.

In Java 7+, Objects.equals(a, b) is even cleaner — null-safe on both sides:

Java
if (Objects.equals(input, "admin")) {
    grantAdmin();
}

The == operator on Strings happens to work sometimes — when both sides are pooled literals with the same content. That's actually the worst kind of bug: it works in unit tests, then fails in production when one side comes from new String(), file I/O, or network input. Don't rely on it.

A related method: .equalsIgnoreCase(). For comparing strings without case sensitivity:

Java
"Hello".equalsIgnoreCase("hello");  // true

Avoid the pattern of .toLowerCase().equals(...).equalsIgnoreCase() is faster and clearer.


Concatenation: the 1000-object trap

String concatenation with + looks innocent:

Java
String result = "";
for (int i = 0; i < 1000; i++) {
    result += i;
}

This is one of the most-cited Java performance traps. The compiler can't optimise the concatenation across loop iterations. Each result += i actually means:

  1. Read the current value of result (an immutable String).
  2. Create a new String containing result + i.
  3. Reassign result to point to the new String.
  4. The previous result becomes garbage.

After 1000 iterations: 1000 String objects created and orphaned. The JVM's GC has to clean them all up. Time complexity is roughly O(n²) — each step copies the entire accumulated string.

The fix is **StringBuilder**, a mutable string class:

Java
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 1000; i++) {
    sb.append(i);
}
String result = sb.toString();

StringBuilder maintains an internal char array that resizes (doubles) when needed. Appending is amortized O(1). The whole loop is O(n). On 1000 iterations the difference is noticeable; on 100,000 iterations it's the difference between milliseconds and seconds.

There's also StringBuffer, which is identical to StringBuilder but synchronized (thread-safe). In modern code you almost never want StringBuffer — synchronization adds overhead, and shared mutable string buffers across threads are usually a design smell anyway.

| Class | Mutable? | Thread-safe? | Use when |
|---|---|---|---|
| String | No | Yes (because immutable) | Default. Holding text values. |
| StringBuilder | Yes | No | Building strings in a single thread. **Use this 99% of the time.** |
| StringBuffer | Yes | Yes (synchronized) | Legacy. Almost never needed in modern code. |

**Important nuance:** a SINGLE + outside a loop (String full = first + " " + last) is fine. The compiler converts that into a StringBuilder under the hood automatically. The trap is concatenation *inside a loop* where each iteration creates a new throwaway String.


Common String methods you'll use constantly

A quick reference for the methods you'll reach for again and again. All of these return a new String (or a value) without modifying the original.

Java
String s = "  Hello, World  ";

s.length();                   // 16
s.charAt(2);                  // 'H'
s.substring(2, 7);            // "Hello"
s.indexOf("World");           // 9
s.lastIndexOf("l");           // 12
s.contains("Hello");          // true
s.startsWith("  Hello");      // true
s.endsWith("World  ");        // true
s.toUpperCase();              // "  HELLO, WORLD  "
s.toLowerCase();              // "  hello, world  "
s.trim();                     // "Hello, World" — strips ASCII whitespace
s.strip();                    // "Hello, World" — Java 11+, Unicode-aware
s.replace("World", "Java");   // "  Hello, Java  "
s.split(", ");                 // ["  Hello", "World  "]
s.isEmpty();                  // false
s.isBlank();                  // false — Java 11+, true if only whitespace

String.join(",", "a", "b");   // "a,b"
String.format("Hi, %s!", "A");// "Hi, A!"
"Hi, %s!".formatted("A");     // "Hi, A!" — Java 15+

A few notes:

**trim() vs strip().** trim() only removes ASCII whitespace (chars ≤ U+0020). strip() is Unicode-aware — it also handles things like full-width spaces in CJK text. Prefer strip() in modern code unless you have a specific reason.

**split() returns an array, not a List.** Convert with Arrays.asList(s.split(",")) or List.of(s.split(",")) if you need a List.

**Conversions to primitives.** Integer.parseInt("42") returns the int 42. Integer.parseInt("abc") throws NumberFormatException. Same pattern for Long.parseLong, Double.parseDouble, Boolean.parseBoolean. Always wrap user input parsing in try/catch.


Text blocks (modern multi-line strings)

Pre-Java 15, multi-line strings were ugly:

Java
String json = "{\n" +
              "    \"name\": \"Alice\",\n" +
              "    \"age\": 30\n" +
              "}";

Java 15 added **text blocks** using triple-quote syntax:

Java
String json = """
    {
        "name": "Alice",
        "age": 30
    }
    """;

Text blocks preserve newlines, allow embedded double quotes without escaping, and ignore leading whitespace up to the position of the closing """. They're perfect for embedded SQL queries, JSON literals, HTML templates, and any multi-line content.

Java
String sql = """
    SELECT id, name, email
    FROM users
    WHERE created_at > ?
      AND status = 'active'
    """;

You can interpolate at runtime with formatted():

Java
String greeting = """
    Hello, %s!
    Your account balance is $%.2f.
    """.formatted(name, balance);

There's no built-in string interpolation in Java (no ${name} syntax like Kotlin or JavaScript). The closest is formatted() or String.format(). Some Java enhancement proposals have explored true interpolation but as of Java 21 it's still preview-only.


Common pitfalls in production code

A consolidated list of String mistakes that show up in real codebases.

**1. s.replace() doesn't modify s.** It returns a new String. If you don't capture the return value, nothing happens.

Java
String s = "hello";
s.replace("h", "H");        // creates "Hello" then discards it
System.out.println(s);      // still "hello"

s = s.replace("h", "H");    // captures the new String

**2. == for content comparison.** Always wrong. Use .equals().

**3. Concatenation in a loop with +.** Use StringBuilder.

**4. parseInt of "12.5".** Throws NumberFormatException. parseInt only handles integer strings — use parseDouble for decimals.

**5. s.charAt(s.length()) to get the last char.** Off-by-one — should be s.length() - 1. charAt(s.length()) throws StringIndexOutOfBoundsException.

**6. Calling .toString() on a null reference.**

Java
Integer x = null;
String s = "value: " + x;            // "value: null" — fine
String s = "value: " + x.toString();  // NullPointerException

The first form uses String.valueOf(x) internally, which handles null gracefully. The second calls .toString() directly on null.

**7. Comparing passwords with .equals().** The standard .equals() short-circuits on first character mismatch, making it vulnerable to timing attacks. For password comparison, use MessageDigest.isEqual(bytesA, bytesB) which is constant-time. (You should be hashing passwords with bcrypt/argon2 anyway, but constant-time comparison applies to the hash too.)

**8. Building log messages with concatenation.**

Java
log.info("user " + userId + " did " + action);   // even if log level is off, concatenation runs
log.info("user {} did {}", userId, action);      // parameterized — formatting only happens if log fires

Modern logging frameworks (SLF4J, Logback, Log4j2) support parameterized messages. Use them. The runtime cost of disabled-but-formatted log statements adds up.


When the String becomes the bottleneck

For 99% of Java code, the default String behaviour is fine. But at scale — high-throughput servers, data pipelines, performance-critical algorithms — String allocation can become measurable.

**Profile first.** Don't optimise speculatively. Use JFR or a sampling profiler to confirm Strings are actually the hotspot.

**Reuse buffers.** Instead of allocating a new StringBuilder per call, reuse a pooled one. Apache Commons has utilities for this; libraries like Netty have their own buffer pooling.

**Watch out for String.format() and similar.** These methods are flexible but relatively slow. In a hot path, manual concatenation or StringBuilder is faster.

**Consider primitive char arrays.** If you're doing intensive character-level processing (parsing, encoding), working directly with char[] can skip the String overhead. Read into a char array, process in place, build the result.

**Use intern() selectively.** If you have a closed set of frequently-appearing strings (status codes, enum-like values from a DB column) and a lot of duplicate copies, intern() saves heap. But intern is a contended global table — only intern when you know there's a duplication problem worth solving.

**StringJoiner and Collectors.joining().** For joining a collection of strings, String.join(), StringJoiner, and Collectors.joining() are all internally optimised. Faster and clearer than a loop with +=.

Java
String csv = String.join(",", values);
String csv = values.stream().collect(Collectors.joining(","));

Both build a single StringBuilder internally.


Modern Java additions worth knowing

A quick tour of String-related features added in newer Java versions.

**Java 11:**
- String.isBlank() — returns true for empty or whitespace-only strings.
- String.strip(), stripLeading(), stripTrailing() — Unicode-aware whitespace removal.
- String.repeat(n)"ab".repeat(3)"ababab".
- String.lines() — returns a Stream of lines (split on line terminators).

**Java 12+:**
- String.indent(n) — adjusts the indentation of each line.

**Java 15:**
- Text blocks (""").
- String.formatted(args) — instance-method version of String.format.

**Java 21:**
- String templates were proposed (preview) — true interpolation like STR."Hello \{name}". Status as of 2026: still preview, not yet final.

For everyday Java work, knowing the modern methods (isBlank, strip, repeat, lines, text blocks, formatted) is more than enough. The String type itself has barely changed in two decades and probably won't in another two.


⁂ Back to all modules