guava Strings Splitting a string into a list


Example

To split strings, Guava introduces the Splitter class.

Why not use Java's splitting capabilities?

As a rule, Guava does not duplicate functionality that is readily available in Java. Why then do we need an additional Splitter class? Do the split methods in Java's String class not provide us with all the string splitting mechanics we'll ever need?

The easiest way to answer that question is with a couple of examples. First off, we'll deal with the following gunslinging duo:

String gunslingers = "Wyatt Earp+Doc Holliday";

To try and split up the legendary lawman and his dentist friend, we might try the following:

String[] result = gunslingers.split("+"); // wrong

At runtime, however, we are confronted with the following exception:

Exception in thread "main" java.util.regex.PatternSyntaxException:
Dangling meta character '+' near index 0

After an involuntary facepalm, we're quick to remember that String's split method takes a regular expression as an argument, and that the + character is used as a quantifier in regular expressions. The solution is then to escape the + character, or enclose it in a character class.

String[] result = gunslingers.split("\\+");
String[] result = gunslingers.split("[+]");

Having successfully resolved that issue, we move on to the three musketeers.

String musketeers = ",Porthos , Athos ,Aramis,";

The comma has no special meaning in regular expressions, so let's count the musketeers by applying the String.split() method and getting the length of the resulting array.

System.out.println(musketeers.split(",").length);

Which yields the following result in the console:

4

Four? Given the fact that the string contains a leading and a trailing comma, a result of five would have been within the realm of normal expectations, but four? As it turns out, the behavior of Java's split method is to preserve leading, but to discard trailing empty strings, so the actual contents of the array are ["", "Porthos ", " Athos ", "Aramis"].

Since we don't need any empty strings, leading nor trailing, let's filter them out with a loop:

for (String musketeer : musketeers.split(",")) {
    if (!musketeer.isEmpty()) {
        System.out.println(musketeer);
    }
}

This gives us the following output:

Porthos 
 Athos
Aramis

As you can see in the output above, the extra spaces before and after the comma separators have been preserved in the output. To get around that, we can trim off the unneeded spaces, which will finally yield the desired output:

for (String musketeer : musketeers.split(",")) {
    if(!musketeer.isEmpty()) {
        System.out.println(musketeer.trim());
    }
}

(Alternatively, we could also adapt the regular expression to include whitespace surrounding the comma separators. However, keep in mind that leading spaces before the first entry or trailing spaces after the last entry would still be preserved.)

After reading through the examples above, we can't help but conclude that splitting strings with Java is mildly annoying at best.

Splitting strings with Guava

The best way to demonstrate how Guava turns splitting strings into a relatively pain­free experience, is to treat the same two strings again, but this time using Guava's Splitter class.

List<String> gunslingers = Splitter.on('+')
        .splitToList("Wyatt Earp+Doc Holliday");
List<String> musketeers = Splitter.on(",")
        .omitEmptyStrings()
        .trimResults()
        .splitToList(",Porthos , Athos ,Aramis,");

As you can see in the code above, Splitter exposes a fluent API, and lets you create instances through a series of static factory methods:

  • static Splitter on(char separator)
    Lets you specify the separator as a character.

  • static Splitter on(String separator)
    Lets you specify the separator as a string.

  • static Splitter on(CharMatcher separatorMatcher)
    Lets you specify the separator as a Guava CharMatcher.

  • static Splitter on(Pattern separatorPattern)
    Lets you specify the separator as a Java regular expression Pattern.

  • static Splitter onPattern(String separatorPattern)
    Lets you specify the separator as a regular expression string.

In addition to these separator-­based factory methods, there's also a static Splitter fixedLength(int length) method to create Splitter instances that split strings into chunks of the specified length.

After the Splitter instance is created, a number of modifiers can be applied:

  • Splitter omitEmptyStrings()
    Instructs the Splitter to exclude empty strings from the results.

  • Splitter trimResults()
    Instructs the Splitter to trim results using the predefined whitespace CharMatcher.

  • Splitter trimResults(CharMatcher trimmer)
    Instructs the Splitter to trim results using the specified CharMatcher.

After creating (and optionally modifying) a Splitter, it can be invoked on a character sequence by invoking its split method, which will return an object of type Iterable<String>, or its splitToList method, which will return an (immutable) object of type List<String>.

You might wonder in which cases it would be beneficial to use the split method (which returns an Iterable) instead of the splitToList method (which returns the more commonly used List type). The short answer to that is: you probably want to use the split method only for processing very large strings. The slightly longer answer is that because the split method returns an Iterable, the split operations can be lazily evaluated (at iteration time), thus removing the need to keep the entire result of the split operation in memory.