Java Language The String Concatenation Operator (+)


Example

The + symbol can mean three distinct operators in Java:

  • If there is no operand before the +, then it is the unary Plus operator.
  • If there are two operands, and they are both numeric. then it is the binary Addition operator.
  • If there are two operands, and at least one of them is a String, then it it the binary Concatenation operator.

In the simple case, the Concatenation operator joins two strings to give a third string. For example:

String s1 = "a String";
String s2 = "This is " + s1;    // s2 contains "This is a String"

When one of the two operands is not a string, it is converted to a String as follows:

  • An operand whose type is a primitive type is converted as if by calling toString() on the boxed value.

  • An operand whose type is a reference type is converted by calling the operand's toString() method. If the operand is null, or if the toString() method returns null, then the string literal "null" is used instead.

For example:

int one = 1;
String s3 = "One is "  + one;         // s3 contains "One is 1"
String s4 = null + " is null";        // s4 contains "null is null"
String s5 = "{1} is " + new int[]{1}; // s5 contains something like
                                      // "{} is [I@xxxxxxxx"

The explanation for the s5 example is that the toString() method on array types is inherited from java.lang.Object, and the behavior is to produce a string that consists of the type name, and the object's identity hashcode.

The Concatenation operator is specified to create a new String object, except in the case where the expression is a Constant Expression. In the latter case, the expression is evaluated at compile type, and its runtime value is equivalent to a string literal. This means that there is no runtime overhead in splitting a long string literal like this:

String typing = "The quick brown fox " +
                "jumped over the " +
                "lazy dog";           // constant expression

Optimization and efficiency

As noted above, with the exception of constant expressions, each string concatenation expression creates a new String object. Consider this code:

public String stars(int count) {
    String res = "";
    for (int i = 0; i < count; i++) {
        res = res + "*";
    }
    return res;
}

In the method above, each iteration of the loop will create a new String that is one character longer than the previous iteration. Each concatenation copies all of the characters in the operand strings to form the new String. Thus, stars(N) will:

  • create N new String objects, and throw away all but the last one,
  • copy N * (N + 1) / 2 characters, and
  • generate O(N^2) bytes of garbage.

This is very expensive for large N. Indeed, any code that concatenates strings in a loop is liable to have this problem. A better way to write this would be as follows:

public String stars(int count) {
    // Create a string builder with capacity 'count' 
    StringBuilder sb = new StringBuilder(count);
    for (int i = 0; i < count; i++) {
        sb.append("*");
    }
    return sb.toString();
}

Ideally, you should set the capacity of the StringBuilder, but if this is not practical, the class will automatically grow the backing array that the builder uses to hold characters. (Note: the implementation expands the backing array exponentially. This strategy keeps that amount of character copying to a O(N) rather than O(N^2).)

Some people apply this pattern to all string concatenations. However, this is unnecessary because the JLS allows a Java compiler to optimize string concatenations within a single expression. For example:

String s1 = ...;
String s2 = ...;    
String test = "Hello " + s1 + ". Welcome to " + s2 + "\n";

will typically be optimized by the bytecode compiler to something like this;

StringBuilder tmp = new StringBuilder();
tmp.append("Hello ")
tmp.append(s1 == null ? "null" + s1);
tmp.append("Welcome to ");
tmp.append(s2 == null ? "null" + s2);
tmp.append("\n");
String test = tmp.toString();

(The JIT compiler may optimize that further if it can deduce that s1 or s2 cannot be null.) But note that this optimization is only permitted within a single expression.

In short, if you are concerned about the efficiency of string concatenations:

  • Do hand-optimize if you are doing repeated concatenation in a loop (or similar).
  • Don't hand-optimize a single concatenation expression.