The simple way to implement multi-threaded applications is to use Java's built-in synchronization and locking primitives; e.g. the synchronized
keyword. The following example shows how we might use synchronized
to accumulate counts.
public class Counters {
private final int[] counters;
public Counters(int nosCounters) {
counters = new int[nosCounters];
}
/**
* Increments the integer at the given index
*/
public synchronized void count(int number) {
if (number >= 0 && number < counters.length) {
counters[number]++;
}
}
/**
* Obtains the current count of the number at the given index,
* or if there is no number at that index, returns 0.
*/
public synchronized int getCount(int number) {
return (number >= 0 && number < counters.length) ? counters[number] : 0;
}
}
This implementation will work correctly. However, if you have a large number of threads making lots of simultaneous calls on the same Counters
object, the synchronization is liable to be a bottleneck. Specifically:
synchronized
method call will start with the current thread acquiring the lock for the Counters
instance.number
value and updates the counter.If one thread attempts to acquire the lock while another one holds it, the attempting thread will be blocked (stopped) at step 1 until the lock is released. If multiple threads are waiting, one of them will get it, and the others will continue to be blocked.
This can lead to a couple of problems:
If there is a lot of contention for the lock (i.e. lots of thread try to acquire it), then some threads can be blocked for a long time.
When a thread is blocked waiting for the lock, the operating system will typically try switch execution to a different thread. This context switching incurs a relatively large performance impact on the processor.
When there are multiple threads blocked on the same lock, there are no guarantees that any one of them will be treated "fairly" (i.e. each thread is guaranteed to be scheduled to run). This can lead to thread starvation.
Let us start by rewriting the example above using AtomicInteger
counters:
public class Counters {
private final AtomicInteger[] counters;
public Counters(int nosCounters) {
counters = new AtomicInteger[nosCounters];
for (int i = 0; i < nosCounters; i++) {
counters[i] = new AtomicInteger();
}
}
/**
* Increments the integer at the given index
*/
public void count(int number) {
if (number >= 0 && number < counters.length) {
counters[number].incrementAndGet();
}
}
/**
* Obtains the current count of the object at the given index,
* or if there is no number at that index, returns 0.
*/
public int getCount(int number) {
return (number >= 0 && number < counters.length) ?
counters[number].get() : 0;
}
}
We have replaced the int[]
with an AtomicInteger[]
, and initialized it with an instance in each element. We have also added calls to incrementAndGet()
and get()
in place of operations on int
values.
But the most important thing is that we can remove the synchronized
keyword because locking is no longer required. This works because the incrementAndGet()
and get()
operations are atomic and thread-safe. In this context, it means that:
Each counter in the array will only be observable in the either the "before" state for an operation (like an "increment") or in the "after" state.
Assuming that the operation occurs at time T
, no thread will be able to see the "before" state after time T
.
Furthermore, while two threads might actually attempt to update the same AtomicInteger
instance at the same time, the implementations of the operations ensure that only one increment happens at a time on the given instance. This is done without locking, often resulting in better performance.
Atomic types typically rely on specialized hardware instructions in the instruction set of the target machine. For example, Intel-based instruction sets provide a CAS
(Compare and Swap) instruction that will perform a specific sequence of memory operations atomically.
These low-level instructions are are used to implement higher-level operations in the APIs of the respective AtomicXxx
classes. For example, (again, in C-like pseudocode):
private volatile num;
int increment() {
while (TRUE) {
int old = num;
int new = old + 1;
if (old == compare_and_swap(&num, old, new)) {
return new;
}
}
}
If there is no contention on the AtomicXxxx
, the if
test will succeed and the loop will end immediately. If there is contention, then the if
will fail for all but one of the threads, and they will "spin" in the loop for a small number of cycles of the loop. In practice, the spinning is orders of magnitude faster (except at unrealistically high levels of contention, where synchronized performs better than atomic classes because when the CAS operation fails, then the retry will only add more contention) than suspending the thread and switching to another one.
Incidentally, CAS instructions are typically used by the JVM to implement uncontended locking. If the JVM can see that a lock is not currently locked, it will attempt to use a CAS to acquire the lock. If the CAS succeeds, then there is no need to do the expensive thread scheduling, context switching and so on. For more information on the techniques used, see Biased Locking in HotSpot.