Home
Java from First Principles / Chapter 12 — Collections Framework

Collections Framework

List, Set, Map, Queue — the interfaces and the implementations. When to pick which. How HashMap actually works inside.


The most-used part of the standard library

Almost every non-trivial Java program uses collections. A list of users. A map from product ID to inventory count. A set of seen IDs to deduplicate. The Java Collections Framework is the part of the standard library you'll touch every day.

The framework was added in Java 1.2 (1998). It's not perfect — there are some legacy classes (Vector, Hashtable) that predate it and never quite fit, and a few corners that show their age. But the core abstractions (List, Set, Map, Queue) and the main implementations (ArrayList, HashMap, HashSet, LinkedList, TreeMap) hold up beautifully. They're some of the most carefully-tuned code in the JDK.

This chapter walks through the framework, picks apart how HashMap actually works under the hood, and gives a decision tree for picking the right collection for the job.

Java Collections framework: interfaces and main implementations Collections framework «interface» Iterable iterator() «interface» Collection add, remove, size... «interface» List indexed, ordered «interface» Set no duplicates «interface» Queue FIFO / LIFO ArrayList LinkedList array-backed O(1) get, O(n) insert middle doubly-linked O(n) get, O(1) insert at known node HashSet TreeSet hash-based, unordered O(1) operations red-black tree sorted, O(log n) ArrayDeque LinkedList circular array fastest queue implements both List and Deque «interface» Map (separate) HashMap, TreeMap, LinkedHashMap
Two parallel hierarchies. Collection covers Lists, Sets, and Queues — everything that holds elements. Map is separate because it holds key-value pairs, not single elements. The interfaces describe contracts; the concrete classes pick storage strategies (array, linked nodes, hash table, tree) that change the performance characteristics.

The interface hierarchy

Everything in Collections lives under one of these top-level interfaces:

**Iterable<E>** — the root. Anything you can loop over with a for-each. Just declares iterator().

**Collection<E>** — adds, removes, contains, size. The base for List, Set, Queue.

**List<E>** — ordered, indexed. get(int index), add(int index, E e), can hold duplicates.

**Set<E>** — no duplicates. May or may not preserve insertion order, depending on implementation.

**Queue<E>** — designed for processing elements in order. FIFO (offer/poll) or priority-ordered. Subinterface Deque<E> supports both ends (push, pop, addFirst, etc.).

**Map<K, V>** — separate hierarchy because it stores key-value PAIRS, not single elements. Doesn't extend Collection.

You write your code against these interfaces. The concrete implementations get picked based on performance and behaviour needs.


List implementations: ArrayList vs LinkedList

Two main implementations of List. They have very different performance profiles.

**ArrayList** — backed by a resizable array. The default choice.

Java
List<String> names = new ArrayList<>();
names.add("Alice");
names.add("Bob");
names.get(0);   // O(1) — direct array access

| Operation | Cost |
|---|---|
| get(int index) | O(1) |
| add(E e) (at end) | Amortised O(1) |
| add(int index, E e) (middle) | O(n) — must shift later elements |
| remove(int index) | O(n) — must shift |
| contains(Object o) | O(n) — linear scan |

The underlying array doubles in size when it fills up. The default initial capacity is 10. If you know the rough size in advance, give it as a constructor argument: new ArrayList<>(1000). Saves several resize operations.

**LinkedList** — doubly-linked list of nodes.

Java
List<String> names = new LinkedList<>();

| Operation | Cost |
|---|---|
| get(int index) | O(n) — must walk the chain |
| add(E e) (at end) | O(1) |
| add(0, E e) (at start) | O(1) |
| addFirst(E e) / addLast(E e) | O(1) |
| remove(0) / removeLast() | O(1) |
| contains(Object o) | O(n) |

LinkedList sounds good — O(1) inserts at either end! But in practice:
- The "O(1) insert middle" advantage requires you to already have a reference to the node, which getting requires O(n) anyway.
- The cache performance is terrible (each node is a separate heap allocation), so iteration is slower than ArrayList in practice.

**Modern rule of thumb:** use ArrayList. The only time LinkedList wins is when you're doing very heavy add-to-front and remove-from-front operations as a deque — and even then, ArrayDeque is faster.

**ArrayDeque** — the actual best choice for queue/stack/deque behaviour. Backed by a circular array, no node allocations per operation.

Java
Deque<String> stack = new ArrayDeque<>();
stack.push("a");
stack.push("b");
stack.pop();   // "b"

Set implementations: HashSet, LinkedHashSet, TreeSet

Sets enforce no duplicates. The implementations differ in how they decide "is this a duplicate" and what order they iterate in.

**HashSet** — backed by a HashMap (the values are ignored). O(1) for add, remove, contains. No iteration order guarantee — when you loop over a HashSet, the order is essentially random and may change with each version of Java.

Java
Set<String> seen = new HashSet<>();
seen.add("alice");
seen.add("alice");   // ignored — duplicate
seen.size();         // 1

**LinkedHashSet** — same as HashSet plus preserves insertion order. Tiny overhead (extra linked list of entries). Use when you want O(1) operations AND predictable iteration.

Java
Set<String> seen = new LinkedHashSet<>();
seen.add("alice");
seen.add("bob");
seen.add("carol");
// iteration is alice, bob, carol — in insertion order

**TreeSet** — backed by a red-black tree. Maintains elements in SORTED order. O(log n) for operations.

Java
Set<Integer> sorted = new TreeSet<>();
sorted.add(5);
sorted.add(1);
sorted.add(3);
// iteration is 1, 3, 5

Elements must implement Comparable, or you pass a Comparator to the TreeSet constructor.

**Decision matrix:**
- Want fastest possible? → HashSet
- Want fast + predictable iteration order? → LinkedHashSet
- Want sorted iteration? → TreeSet


Map implementations: the deep dive on HashMap

Maps are by far the most-used collection in real Java code. HashMap deserves its own deep look.

How HashMap stores entries: hash, bucket, chain HashMap.put(key, value) 1. Compute hash code of key "alice".hashCode() = 92668751 2. Modulo bucket count to find index 92668751 % 16 = 15 3. Place entry in bucket 15 If collision: chain or convert to tree Bucket array (initial capacity 16) 0 1 2 ... 14 15 "alice" → User{age=30} next → "bob" → User{age=25} next → null When chain in one bucket exceeds 8 entries, HashMap (Java 8+) converts it to a balanced tree for O(log n) lookup instead of O(n).
HashMap is an array of buckets. To find a key, it hashes the key, mods by bucket count, and looks in that bucket. Collisions form a linked chain — or a tree, if the chain grows past 8 entries. The whole structure is what makes get() and put() O(1) on average.

**Structure.** A HashMap is an array of "buckets". The array starts at 16 (configurable). To find where to store a key-value pair:

  1. Call key.hashCode() — gets an int.
  2. Spread the hash via internal bit manipulation (helps distribute keys evenly).
  3. Take hash & (capacity - 1) — gives the bucket index. (Equivalent to hash % capacity for power-of-two capacities, but faster.)
  4. Place the entry in that bucket.

**Collisions.** Two different keys can land in the same bucket. HashMap handles this by chaining: each bucket is a linked list of entries with the same bucket index. To find a key, hash it, find the bucket, then walk the chain checking each entry's key.equals(yourKey).

**Tree conversion (Java 8+).** When a single bucket's chain grows past 8 entries, HashMap converts that chain to a balanced tree (red-black). Lookups in that bucket become O(log n) instead of O(n). This protects against pathological hash collisions (DoS attacks, badly written hashCode methods).

**Resizing.** When the map gets too full (load factor of 0.75 by default — i.e., when 12 out of 16 buckets are occupied), HashMap doubles the bucket array and reshuffles all entries into the new buckets. This is an O(n) operation, but amortises to O(1) per put.

**Performance summary:**

| Operation | Average | Worst case (without trees) | Worst case (with trees) |
|---|---|---|---|
| get | O(1) | O(n) | O(log n) |
| put | O(1) | O(n) | O(log n) |
| remove | O(1) | O(n) | O(log n) |
| containsKey | O(1) | O(n) | O(log n) |

In practice, with reasonable hashCode implementations, HashMap is basically O(1) for everything. This is why it's the default Map in Java.

**The eternal warning: if you put an object as a key in HashMap, that object's hashCode and equals MUST be consistent.** If either changes after insertion, the entry is "lost" — stored under the old hash, but lookup uses the new hash. Use immutable keys (Strings, records, value objects with all final fields).


Other map implementations

**LinkedHashMap** — HashMap that also preserves insertion order. Same performance, slightly more memory. Useful when you want HashMap speed plus predictable iteration.

A neat feature: LinkedHashMap can iterate in **access** order instead of insertion order. This makes it perfect for implementing an LRU cache:

Java
Map<String, String> cache = new LinkedHashMap<>(16, 0.75f, true) {
    @Override
    protected boolean removeEldestEntry(Map.Entry<String, String> eldest) {
        return size() > 100;   // evict oldest when over 100 entries
    }
};

The third constructor argument (accessOrder = true) means every get moves the entry to the end. Combined with removeEldestEntry, you get an LRU cache in ~5 lines.

**TreeMap** — backed by a red-black tree, sorted by key. O(log n) operations. Useful when you need keys in order, or you need range queries (subMap, floorEntry, ceilingKey).

Java
TreeMap<String, Integer> scores = new TreeMap<>();
scores.put("alice", 85);
scores.put("bob", 70);
scores.put("carol", 90);

scores.firstKey();     // "alice"
scores.lastKey();      // "carol"
scores.headMap("c");   // {alice=85, bob=70}

**ConcurrentHashMap** — thread-safe. Multiple threads can read AND write concurrently without external synchronization. Much faster than Collections.synchronizedMap (which locks the entire map for every operation).

Use ConcurrentHashMap whenever a map is shared between threads. Use HashMap everywhere else.

**Hashtable** — legacy. Predates the Collections Framework. Synchronized everything. Replaced by ConcurrentHashMap in 1998 and nobody updated the docs. Don't use Hashtable in new code.

**EnumMap** — when keys are enum values. Backed by an array indexed by the enum's ordinal. Extremely fast, extremely small.

Java
enum Day { MON, TUE, WED, THU, FRI, SAT, SUN }
EnumMap<Day, String> schedule = new EnumMap<>(Day.class);
schedule.put(Day.MON, "Standup");

Iteration and modification

A common bug: modifying a collection while iterating over it.

Java
List<String> names = new ArrayList<>(List.of("alice", "bob", "carol"));
for (String name : names) {
    if (name.startsWith("b")) {
        names.remove(name);   // ConcurrentModificationException at runtime
    }
}

Java's Iterator (which for-each uses under the hood) detects structural changes during iteration and throws ConcurrentModificationException. This is "fail-fast" behavior — it stops you immediately instead of silently leaving the collection in a strange state.

**Correct ways to remove during iteration:**

Java
// 1. Use the iterator's remove method
Iterator<String> it = names.iterator();
while (it.hasNext()) {
    if (it.next().startsWith("b")) {
        it.remove();
    }
}

// 2. Use removeIf (Java 8+)
names.removeIf(name -> name.startsWith("b"));

// 3. Build a new list
List<String> filtered = names.stream()
    .filter(n -> !n.startsWith("b"))
    .collect(Collectors.toList());

The removeIf form is usually the cleanest in modern code.

**Iteration order guarantees vary by implementation:**
- ArrayList, LinkedList: insertion order, by index.
- HashSet, HashMap: NO guarantee. Order may change between Java versions.
- LinkedHashSet, LinkedHashMap: insertion order (or access order, if configured).
- TreeSet, TreeMap: sorted order by element / key.


Immutable collections

Sometimes you want a collection that cannot be modified — for thread safety, for defensive copying, or just to express intent.

**List.of(...), Set.of(...), Map.of(...)** (Java 9+) — create truly immutable collections.

Java
List<String> roles = List.of("admin", "user", "guest");
roles.add("super");   // throws UnsupportedOperationException

Limits: Map.of is overloaded up to 10 entries. For more, use Map.ofEntries(...). Doesn't accept null elements (throws NPE).

**List.copyOf(coll), Set.copyOf(coll), Map.copyOf(map)** — return an immutable snapshot of any collection.

**Collections.unmodifiableList(...)** — returns a read-only WRAPPER around an existing collection. The wrapper rejects modifications, but if you modify the underlying list directly, the wrapper "sees" the change. Use the Java 9+ copyOf instead for true safety.

**When to return immutable collections:**
- From getter methods — prevent callers from mutating your internal state.
- From factory methods that build "result" collections.
- As constants: public static final List<String> VALID_STATUSES = List.of("active", "pending", "closed");

**When NOT to use immutable collections:**
- When the collection genuinely needs to be mutated (workspaces, working sets).
- For caches that need updates.
- For very large collections where the cost of recreating on every "modification" matters.


Decision tree: which collection to pick

The shortcut you'll use 90% of the time.

**"I need a sequence of items, accessed by position."** → ArrayList. Always start here. Switch to LinkedList only if profiling shows a real bottleneck on the rare front-insertions case.

**"I need a stack or queue."** → ArrayDeque. Faster than Stack (legacy), faster than LinkedList.

**"I need a fixed-size sorted list."** → ArrayList + Collections.sort, OR TreeSet if duplicates aren't wanted.

**"I need to check 'have I seen this before?'."** → HashSet. O(1) contains.

**"I need ordered uniqueness."** → LinkedHashSet (insertion order) or TreeSet (natural order).

**"I need key → value lookup."** → HashMap. Fast, default choice.

**"I need key → value lookup with predictable iteration order."** → LinkedHashMap.

**"I need keys sorted."** → TreeMap.

**"I need a thread-safe map."** → ConcurrentHashMap. Don't reach for Collections.synchronizedMap — it's slower.

**"My keys are enums."** → EnumMap. Compact and fast.

**"My values are enums and I need a Set of them."** → EnumSet. Backed by a single long for enums with <64 values. Astonishingly compact.

**"I need to keep elements in priority order."** → PriorityQueue. A heap, not a sorted list — peek is O(1), insert and remove are O(log n).

**"I need a thread-safe queue between producers and consumers."** → BlockingQueue implementations: ArrayBlockingQueue, LinkedBlockingQueue, SynchronousQueue. From java.util.concurrent.


Common mistakes

**1. Mutating keys after insertion.**

Java
Set<List<String>> set = new HashSet<>();
List<String> list = new ArrayList<>(List.of("a", "b"));
set.add(list);
list.add("c");           // mutates the key
set.contains(list);      // probably false now — list's hashCode changed

Don't put mutable objects in HashSet/HashMap as keys. Use immutable types.

**2. Using == on collections.**

Java
List<String> a = List.of("x");
List<String> b = List.of("x");
a == b;          // false — different references
a.equals(b);     // true — content equal

Same as Strings — use .equals() for content comparison.

**3. Iterating and modifying.** Covered above. Use removeIf or an explicit Iterator.

**4. Choosing the wrong implementation.** LinkedList for indexed access. Hashtable for new code. Vector for anything.

**5. Storing primitives.** You can't — collections hold objects, so List<int> is illegal. You have to use List<Integer> and pay the autoboxing cost. For high-performance numeric collections, consider Eclipse Collections or Koloboke (third-party libraries with primitive specializations).

**6. Returning the internal collection from a getter.**

Java
public class Team {
    private final List<Player> players = new ArrayList<>();
    public List<Player> getPlayers() { return players; }   // exposes internal state!
}

Callers can .add() or .clear() your internal list. Return List.copyOf(players) or Collections.unmodifiableList(players).

**7. Capacity not being size.** new ArrayList<>(1000) creates a list with capacity 1000 but SIZE 0. list.get(0) throws IndexOutOfBoundsException. To pre-fill with nulls: new ArrayList<>(Collections.nCopies(1000, null)).


⁂ Back to all modules