How Java HashMaps Work – Internal Mechanics Explained

A HashMap is one of the most commonly used data structures in Java, and it's known for its efficiency. Data in a HashMap is stored in the form of key-value pairs.

In this article, I will introduce you to HashMaps in Java. We will explore the common operations of HashMap and then delve into how it operates internally. You will gain an understanding of the hash function and how index calculation takes place. Finally, we will look at the time complexities of the operations and touch upon the behavior in a concurrent environment.

What is a `HashMap` in Java?

A HashMap implements the Map interface, which is part of the Java collection framework. It's based on the concept of Hashing.

Hashing is a technique that transforms an input of arbitrary size into a fixed-size output using a hash function. The generated output is called the hash code and is represented by an integer value in Java. Hash codes are used for efficient lookup and storage operations in a HashMap.

Common Operations

Like we discussed above, data in a HashMap is stored in the form of key-value pairs. The key is a unique identifier, and each key is associated with a value.

Below are some common operations supported by a HashMap. Let's understand what these methods do with some simple code examples:

Insertion

This method inserts a new key-value pair to the HashMap.
The insertion order of the key-value pairs is not maintained.
During insertion, if a key is already present, the existing value will be replaced with the new value that is passed.
You can insert only one null key into the HashMap, but you can have multiple null values.

The method signature for this operation is given below, followed by an example:

public V put(K key, V value)

Map<String, Integer> map = new HashMap<>();
map.put("apple", 1);
map.put("banana", 2);

In the code above, we have a HashMap example where we add a key of type String and value of type Integer.

Retrieval:

Fetches the value associated with a given key.
Returns null if the key does not exist in the HashMap.

The method signature for this operation is given below, followed by an example:

public V get(Object key)

Integer value = map.get("apple"); // returns 1

In the code above, we're retrieving the value associated with the key apple.

Other common operations include:

remove: Removes the key-value pair for the specified key. It returns null if the key is not found.
containsKey: Checks if a specific key is present in the HashMap.
containsValue: Checks if the specified value is present in the HashMap.

Internal Structure of a `HashMap`

Internally, a HashMap uses an array of buckets or bins. Each bucket is a linked list of type Node, which is used to represent the key-value pair of the HashMap.

static class Node<K, V> {
    final int hash;
    final K key;
    V value;
    Node<K, V> next;

    Node(int hash, K key, V value, Node<K, V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }
}

Above, you can see the structure of the Node class which is used to store the key-value pairs of the HashMap.

The Node class has the following fields:

hash: Refers to the hashCode of the key.
key: Refers to the key of the key-value pair.
value: Refers to the value associated with the key.
next: Acts as a reference to the next node.

The HashMap is fundamentally based on a hash table implementation, and its performance depends on two key parameters: initial capacity and load factor. The original javadocs of the Hash table class define these two parameters as follows:

The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.
The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.

Let's now try and understand how the basic operations, put and get, work in a HashMap.

Hash Function

During the insertion (put) of a key-value pair, the HashMap first calculates the hash code of the key. The hash function then computes an integer for the key. Classes can use the hashCode method of the Object class or override this method and provide their own implementation. (Read about the hash code contract here). The hash code is then XORed (eXclusive OR) with its upper 16 bits (h >>> 16) to achieve a more uniform distribution.

XOR is a bitwise operation that compares two bits, resulting in 1 if the bits are different and 0 if they are the same. In this context, performing a bitwise XOR operation between the hash code and its upper 16 bits (obtained using the unsigned right shift >>> operator) helps to mix the bits, leading to a more evenly distributed hash code.

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Above, you can see the static hash method of the HashMap class.

The idea is to have a unique hash code for each key, but the hash function could produce the same hash code for different keys. This leads to a situation known as a collision. We will see how to handle collisions in the next section.

Index Calculation

Once the hash code for a key is generated, the HashMap calculates an index within the array of buckets to determine where the key-value pair will be stored. This is done using a bitwise AND operation, which is an efficient way to calculate the modulo when the array length is a power of two.

int index = (n - 1) & hash;

Here, we're calculating the index where n is the length of the bucket array.

Once the index is calculated, the key is then stored at that index in the bucket array. However, if multiple keys end up having the same index, it causes a collision. In such a scenario, the HashMap handles it in one of two ways:

Chaining/Linking: Each bucket in the array is a linked list of nodes. If a key already exists at a particular index and another key gets hashed to the same index, it gets appended to the list.
Treeify: If the number of nodes exceeds a certain threshold, the linked list is converted into a tree (This was introduced in Java 8).

static final int TREEIFY_THRESHOLD = 8;

This is the threshold that determines treeification.

Therefore, it is essential to have a good hash function that uniformly distributes the keys across the buckets and minimizes the chances of collisions.

The retrieval (get) and deletion (remove) operations work similarly to the insertion (put) operation. Here's how:

Retrieval (get): Computes the hash code using the hash function -> calculates the index using the hash code -> traverses the linked list or tree to find the node with the matching key.
Deletion (remove): Computes the hash code using the hash function -> calculates the index using the hash code -> removes the node from the linked list or tree.

Time Complexity

The basic operations of a HashMap, such as put, get, and remove, generally offer constant time performance of O(1), assuming that the keys are uniformly distributed. In cases where there is poor key distribution and many collisions occur, these operations might degrade to a linear time complexity of O(n).

Under treeification, where long chains of collisions are converted into balanced trees, lookup operations can improve to a more efficient logarithmic time complexity of O(log n).

Synchronization

The HashMap implementation is not synchronized. If multiple threads access a HashMap instance concurrently and iterate over the map, and if any one of the threads performs a structural modification (such as adding or removing a key-value mapping) on the map, it leads to a ConcurrentModificationException.

To prevent this, you can create a thread-safe instance using the Collections.synchronizedMap method.

Conclusion

In summary, understanding the internal workings of a HashMap is crucial for developers to make informed decisions. Knowing how a key is mapped, how collisions happen, and how they can be avoided helps you use the HashMap efficiently and effectively.

What is a HashMap in Java?