Java Language Splitting Strings


Example

You can split a String on a particular delimiting character or a Regular Expression, you can use the String.split() method that has the following signature:

public String[] split(String regex)

Note that delimiting character or regular expression gets removed from the resulting String Array.

Example using delimiting character:

String lineFromCsvFile = "Mickey;Bolton;12345;121216";
String[] dataCells = lineFromCsvFile.split(";");
// Result is dataCells = { "Mickey", "Bolton", "12345", "121216"};

Example using regular expression:

String lineFromInput = "What    do you need    from me?";
String[] words = lineFromInput.split("\\s+"); // one or more space chars
// Result is words = {"What", "do", "you", "need", "from", "me?"};

You can even directly split a String literal:

String[] firstNames = "Mickey, Frank, Alicia, Tom".split(", ");
// Result is firstNames = {"Mickey", "Frank", "Alicia", "Tom"};

Warning: Do not forget that the parameter is always treated as a regular expression.

"aaa.bbb".split("."); // This returns an empty array

In the previous example . is treated as the regular expression wildcard that matches any character, and since every character is a delimiter, the result is an empty array.


Splitting based on a delimiter which is a regex meta-character

The following characters are considered special (aka meta-characters) in regex

  < > - = ! ( ) [ ] { } \ ^ $ | ? * + . 

To split a string based on one of the above delimiters, you need to either escape them using \\ or use Pattern.quote():

  • Using Pattern.quote():

     String s = "a|b|c";
     String regex = Pattern.quote("|");
     String[] arr = s.split(regex);
    
  • Escaping the special characters:

     String s = "a|b|c";
     String[] arr = s.split("\\|");
    

Split removes empty values

split(delimiter) by default removes trailing empty strings from result array. To turn this mechanism off we need to use overloaded version of split(delimiter, limit) with limit set to negative value like

String[] split = data.split("\\|", -1);

split(regex) internally returns result of split(regex, 0).

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.
If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.
If n is negative, then the pattern will be applied as many times as possible and the array can have any length.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.


Splitting with a StringTokenizer

Besides the split() method Strings can also be split using a StringTokenizer.

StringTokenizer is even more restrictive than String.split(), and also a bit harder to use. It is essentially designed for pulling out tokens delimited by a fixed set of characters (given as a String). Each character will act as a separator. Because of this restriction, it's about twice as fast as String.split().

Default set of characters are empty spaces (\t\n\r\f). The following example will print out each word separately.

String str = "the lazy fox jumped over the brown fence";
StringTokenizer tokenizer = new StringTokenizer(str);
while (tokenizer.hasMoreTokens()) {
    System.out.println(tokenizer.nextToken());
}

This will print out:

the
lazy 
fox 
jumped 
over 
the 
brown 
fence

You can use different character sets for separation.

String str = "jumped over";
// In this case character `u` and `e` will be used as delimiters 
StringTokenizer tokenizer = new StringTokenizer(str, "ue");
while (tokenizer.hasMoreTokens()) {
    System.out.println(tokenizer.nextToken());
}

This will print out:

j
mp 
d ov
r