Java Language Garbage collection


Example

The C++ approach - new and delete

In a language like C++, the application program is responsible for managing the memory used by dynamically allocated memory. When an object is created in the C++ heap using the new operator, there needs to be a corresponding use of the delete operator to dispose of the object:

  • If program forgets to delete an object and just "forgets" about it, the associated memory is lost to the application. The term for this situation is a memory leak, and it too much memory leaks an application is liable to use more and more memory, and eventually crash.

  • On the other hand, if an application attempts to delete the same object twice, or use an object after it has been deleted, then the application is liable to crash due to problems with memory corruption

In a complicated C++ program, implementing memory management using new and delete can be time consuming. Indeed, memory management is a common source of bugs.

The Java approach - garbage collection

Java takes a different approach. Instead of an explicit delete operator, Java provides an automatic mechanism known as garbage collection to reclaim the memory used by objects that are no longer needed. The Java runtime system takes responsibility for finding the objects to be disposed of. This task is performed by a component called a garbage collector, or GC for short.

At any time during the execution of a Java program, we can divide the set of all existing objects into two distinct subsets1:

  • Reachable objects are defined by the JLS as follows:

    A reachable object is any object that can be accessed in any potential continuing computation from any live thread.

    In practice, this means that there is a chain of references starting from an in-scope local variable or a static variable by which some code might be able to reach the object.

  • Unreachable objects are objects that cannot possibly be reached as above.

Any objects that are unreachable are eligible for garbage collection. This does not mean that they will be garbage collected. In fact:

  • An unreachable object does not get collected immediately on becoming unreachable1.
  • An unreachable object may not ever be garbage collected.

The Java language Specification gives a lot of latitude to a JVM implementation to decide when to collect unreachable objects. It also (in practice) gives permission for a JVM implementation to be conservative in how it detects unreachable objects.

The one thing that the JLS guarantees is that no reachable objects will ever be garbage collected.

What happens when an object becomes unreachable

First of all, nothing specifically happens when an object becomes unreachable. Things only happen when the garbage collector runs and it detects that the object is unreachable. Furthermore, it is common for a GC run to not detect all unreachable objects.

When the GC detects an unreachable object, the following events can occur.

  1. If there are any Reference objects that refer to the object, those references will be cleared before the object is deleted.

  2. If the object is finalizable, then it will be finalized. This happens before the object is deleted.

  3. The object can be deleted, and the memory it occupies can be reclaimed.

Note that there is a clear sequence in which the above events can occur, but nothing requires the garbage collector to perform the final deletion of any specific object in any specific time-frame.

Examples of reachable and unreachable objects

Consider the following example classes:

// A node in simple "open" linked-list.
public class Node {
    private static int counter = 0;

    public int nodeNumber = ++counter;
    public Node next;
}

public class ListTest {
    public static void main(String[] args) {
        test();                    // M1
        System.out.prinln("Done"); // M2
    }
    
    private static void test() {
        Node n1 = new Node();      // T1
        Node n2 = new Node();      // T2
        Node n3 = new Node();      // T3
        n1.next = n2;              // T4
        n2 = null;                 // T5
        n3 = null;                 // T6
    }
}

Let us examine what happens when test() is called. Statements T1, T2 and T3 create Node objects, and the objects are all reachable via the n1, n2 and n3 variables respectively. Statement T4 assigns the reference to the 2nd Node object to the next field of the first one. When that is done, the 2nd Node is reachable via two paths:

 n2 -> Node2
 n1 -> Node1, Node1.next -> Node2

In statement T5, we assign null to n2. This breaks the first of the reachability chains for Node2, but the second one remains unbroken, so Node2 is still reachable.

In statement T6, we assign null to n3. This breaks the only reachability chain for Node3, which makes Node3 unreachable. However, Node1 and Node2 are both still reachable via the n1 variable.

Finally, when the test() method returns, its local variables n1, n2 and n3 go out of scope, and therefore cannot be accessed by anything. This breaks the remaining reachability chains for Node1 and Node2, and all of the Node objects are nor unreachable and eligible for garbage collection.


1 - This is a simplification that ignores finalization, and Reference classes. 2 - Hypothetically, a Java implementation could do this, but the performance cost of doing this makes it impractical.