Strings (java.lang.String
) are pieces of text stored in your program. Strings are not a primitive data type in Java, however, they are very common in Java programs.
In Java, Strings are immutable, meaning that they cannot be changed. (Click here for a more thorough explanation of immutability.)
Since Java strings are immutable, all methods which manipulate a String
will return a new String
object. They do not change the original String
. This includes to substring and replacement methods that C and C++ programers would expect to mutate the target String
object.
Use a StringBuilder
instead of String
if you want to concatenate more than two String
objects whose values cannot be determined at compile-time. This technique is more performant than creating new String
objects and concatenating them because StringBuilder
is mutable.
StringBuffer
can also be used to concatenate String
objects. However, this class is less performant because it is designed to be thread-safe, and acquires a mutex before each operation. Since you almost never need thread-safety when concatenating strings, it is best to use StringBuilder
.
If you can express a string concatenation as a single expression, then it is better to use the +
operator. The Java compiler will convert an expression containing +
concatenations into an efficient sequence of operations using either String.concat(...)
or StringBuilder
. The advice to use StringBuilder
explicitly only applies when the concatenation involves a multiple expressions.
Don't store sensitive information in strings. If someone is able to obtain a memory dump of your running application, then they will be able to find all of the existing String
objects and read their contents. This includes String
objects that are unreachable and are awaiting garbage collection. If this is a concern, you will need to wipe sensitive string data as soon as you are done with it. You cannot do this with String
objects since they are immutable. Therefore, it is advisable to use a char[]
objects to hold sensitive character data, and wipe them (e.g. overwrite them with '\000'
characters) when you are done.
All String
instances are created on the heap, even instances that correspond to string literals. The special thing about string literals is that the JVM ensures that all literals that are equal (i.e. that consists of the same characters) are represented by a single String
object (this behavior is specified in JLS).
This is implemented by JVM class loaders. When a class loader loads a class, it scans for string literals that are used in the class definition, each time it sees one, it checks if there is already a record in the string pool for this literal (using the literal as a key). If there is already an entry for the literal, the reference to a String
instance stored as the pair for that literal is used. Otherwise, a new String
instance is created and a reference to the instance is stored for the literal (used as a key) in the string pool. (Also see string interning).
The string pool is held in the Java heap, and is subject to normal garbage collection.
In releases of Java before Java 7, the string pool was held in a special part of the heap known as "PermGen". This part was only collected occasionally.
In Java 7, the string pool was moved off from "PermGen".
Note that string literals are implicitly reachable from any method that uses them. This means that the corresponding String
objects can only be garbage collected if the code itself is garbage collected.
Up until Java 8, String
objects are implemented as a UTF-16 char array (2 bytes per char). There is a proposal in Java 9 to implement String
as a byte array with an encoding flag field to note if the string is encoded as bytes (LATIN-1) or chars (UTF-16).