Consider the following naive method for adding two positive numbers using recursion:
public static int add(int a, int b) {
if (a == 0) {
return b;
} else {
return add(a - 1, b + 1); // TAIL CALL
}
}
This is algorithmically correct, but it has a major problem. If you call add
with a large a
, it will crash with a StackOverflowError
, on any version of Java up to (at least) Java 9.
In a typical functional programming language (and many other languages) the compiler optimizes tail recursion. The compiler would notice that the call to add
(at the tagged line) is a tail call, and would effectively rewrite the recursion as a loop. This transformation is called tail-call elimination.
However, current generation Java compilers do not perform tail call elimination. (This is not a simple oversight. There are substantial technical reasons for this; see below.) Instead, each recursive call of add
causes a new frame to be allocated on the thread's stack. For example, if you call add(1000, 1)
, it will take 1000
recursive calls to arrive at the answer 1001
.
The problem is that the size of Java thread stack is fixed when the thread is created. (This includes the "main" thread in a single-threaded program.) If too many stack frames are allocated the stack will overflow. The JVM will detect this and throw a StackOverflowError
.
One approach to dealing with this is to simply use a bigger stack. There are JVM options that control the default size of a stack, and you can also specify the stack size as a Thread
constructor parameter. Unfortunately, this only "puts off" the stack overflow. If you need to do a computation that requires an even larger stack, then the StackOverflowError
comes back.
The real solution is to identify recursive algorithms where deep recursion is likely, and manually perform the tail-call optimization at the source code level. For example, our add
method can be rewritten as follows:
public static int add(int a, int b) {
while (a != 0) {
a = a - 1;
b = b + 1;
}
return b;
}
(Obviously, there are better ways to add two integers. The above is simply to illustrate the effect of manual tail-call elimination.)
There are a number of reasons why adding tail call elimination to Java is not easy. For example:
StackOverflowError
to (for example) place a bound on the size of a computational problem.As John Rose explains in "Tail calls in the VM":
"The effects of removing the caller’s stack frame are visible to some APIs, notably access control checks and stack tracing. It is as if the caller’s caller had directly called the callee. Any privileges possessed by the caller are discarded after control is transferred to the callee. However, the linkage and accessibility of the callee method are computed before the transfer of control, and take into account the tail-calling caller."
In other words, tail-call elimination could cause an access control method to mistakenly think that a security sensitive API was was being called by trusted code.