In a language like C++, the application program is responsible for managing the memory used by dynamically allocated memory. When an object is created in the C++ heap using the new
operator, there needs to be a corresponding use of the delete
operator to dispose of the object:
If program forgets to delete
an object and just "forgets" about it, the associated memory is lost to the application. The term for this situation is a memory leak, and it too much memory leaks an application is liable to use more and more memory, and eventually crash.
On the other hand, if an application attempts to delete
the same object twice, or use an object after it has been deleted, then the application is liable to crash due to problems with memory corruption
In a complicated C++ program, implementing memory management using new
and delete
can be time consuming. Indeed, memory management is a common source of bugs.
Java takes a different approach. Instead of an explicit delete
operator, Java provides an automatic mechanism known as garbage collection to reclaim the memory used by objects that are no longer needed. The Java runtime system takes responsibility for finding the objects to be disposed of. This task is performed by a component called a garbage collector, or GC for short.
At any time during the execution of a Java program, we can divide the set of all existing objects into two distinct subsets1:
Reachable objects are defined by the JLS as follows:
A reachable object is any object that can be accessed in any potential continuing computation from any live thread.
In practice, this means that there is a chain of references starting from an in-scope local variable or a static
variable by which some code might be able to reach the object.
Unreachable objects are objects that cannot possibly be reached as above.
Any objects that are unreachable are eligible for garbage collection. This does not mean that they will be garbage collected. In fact:
The Java language Specification gives a lot of latitude to a JVM implementation to decide when to collect unreachable objects. It also (in practice) gives permission for a JVM implementation to be conservative in how it detects unreachable objects.
The one thing that the JLS guarantees is that no reachable objects will ever be garbage collected.
First of all, nothing specifically happens when an object becomes unreachable. Things only happen when the garbage collector runs and it detects that the object is unreachable. Furthermore, it is common for a GC run to not detect all unreachable objects.
When the GC detects an unreachable object, the following events can occur.
If there are any Reference
objects that refer to the object, those references will be cleared before the object is deleted.
If the object is finalizable, then it will be finalized. This happens before the object is deleted.
The object can be deleted, and the memory it occupies can be reclaimed.
Note that there is a clear sequence in which the above events can occur, but nothing requires the garbage collector to perform the final deletion of any specific object in any specific time-frame.
Consider the following example classes:
// A node in simple "open" linked-list.
public class Node {
private static int counter = 0;
public int nodeNumber = ++counter;
public Node next;
}
public class ListTest {
public static void main(String[] args) {
test(); // M1
System.out.prinln("Done"); // M2
}
private static void test() {
Node n1 = new Node(); // T1
Node n2 = new Node(); // T2
Node n3 = new Node(); // T3
n1.next = n2; // T4
n2 = null; // T5
n3 = null; // T6
}
}
Let us examine what happens when test()
is called. Statements T1, T2 and T3 create Node
objects, and the objects are all reachable via the n1
, n2
and n3
variables respectively. Statement T4 assigns the reference to the 2nd Node
object to the next
field of the first one. When that is done, the 2nd Node
is reachable via two paths:
n2 -> Node2
n1 -> Node1, Node1.next -> Node2
In statement T5, we assign null
to n2
. This breaks the first of the reachability chains for Node2
, but the second one remains unbroken, so Node2
is still reachable.
In statement T6, we assign null
to n3
. This breaks the only reachability chain for Node3
, which makes Node3
unreachable. However, Node1
and Node2
are both still reachable via the n1
variable.
Finally, when the test()
method returns, its local variables n1
, n2
and n3
go out of scope, and therefore cannot be accessed by anything. This breaks the remaining reachability chains for Node1
and Node2
, and all of the Node
objects are nor unreachable and eligible for garbage collection.
1 - This is a simplification that ignores finalization, and Reference
classes.
2 - Hypothetically, a Java implementation could do this, but the performance cost of doing this makes it impractical.