The example Checking a string for unwanted characters, describes how to test and reject strings that don't meet certain criteria. Obviously, rejecting input outright is not always possible, and sometimes you just have to make do with what you receive. In these cases, a cautious developer will attempt to sanitize the provided strings to remove any characters that might trip up further processing.
To remove, trim, and replace unwanted characters, the weapon of choice will again be Guava's CharMatcher
class.
The two CharMatcher
methods of interest in this section are:
String retainFrom(CharSequence sequence)
Returns a string containing all the characters that matched the CharMatcher
instance.
String removeFrom(CharSequence sequence)
Returns a string containing all the characters that did not match the CharMatcher
instance.
As an example, we'll use CharMatcher.digit()
, a predefined CharMatcher
instance that, unsurprisingly, only matches digits.
String rock = "1, 2, 3 o'clock, 4 o'clock rock!";
CharMatcher.digit().retainFrom(rock); // "1234"
CharMatcher.digit().removeFrom(rock); // ", , o'clock, o'clock rock!"
CharMatcher.digit().negate().removeFrom(rock); // "1234"
The last line in this example illustrates that removeFrom
is actually the inverse operation of retainFrom
. Invoking retainFrom
on a CharMatcher
has the same effect as invoking removeFrom
on a negated version of that CharMatcher
.
Removing leading and trailing characters is a very common operation, most frequently used to trim whitespace from strings. Guava's CharMatcher
offers these trimming methods:
String trimLeadingFrom(CharSequence sequence)
Removes all leading characters that match the CharMatcher
instance.
String trimTrailingFrom(CharSequence sequence)
Removes all trailing characters that match the CharMatcher
instance.
String trimFrom(CharSequence sequence)
Removes all leading and trailing characters that match the CharMatcher
instance.
When used with CharMatcher.whitespace()
, these methods will effectively take care of all your whitespace trimming needs:
CharMatcher.whitespace().trimFrom(" Too much space "); // returns "Too much space"
Often, applications will replace characters that are not allowed in a certain situation with a placeholder character. To replace characters in a string, CharMatcher
's API provides the following methods:
String replaceFrom(CharSequence sequence, char replacement)
Replaces all occurrences of characters that match the CharMatcher
instance with the provided replacement character.
String replaceFrom(CharSequence sequence, CharSequence replacement)
Replaces all occurrences of characters that match the CharMatcher
instance with the provided replacement character sequence (string).
String collapseFrom(CharSequence sequence, char replacement)
Replaces groups of consecutive characters that match the CharMatcher
instance with a single instance of the provided replacement character.
String trimAndCollapseFrom(CharSequence sequence, char replacement)
Behaves the same as collapseFrom
, but matching groups at the start and the end are removed rather than replaced.
Let's look at an example that demonstrates how the behavior of these methods differs. Say that we're creating an application that lets the user specify output filenames. To sanitize the input provided by the user, we create a CharMatcher
instance that is a combination of the predefined whitespace CharMatcher
and a custom CharMatcher
that specifies a set of characters that we would rather avoid in our filenames.
CharMatcher illegal = CharMatcher.whitespace().or(CharMatcher.anyOf("<>:|?*\"/\\"));
Now, if we invoke the discussed replacement methods as follows on a filename that is in dire need of cleanup:
String filename = "<A::12> first draft???";
System.out.println(illegal.replaceFrom(filename, '_'));
System.out.println(illegal.collapseFrom(filename, '_'));
System.out.println(illegal.trimAndCollapseFrom(filename, '_'));
We'll see the output below in our console.
_A__12___first_draft___
_A_12_first_draft_
A_12_first_draft