If you need to extract a part of string from the input string, we can use capture groups of regex.
For this example, we'll start with a simple phone number regex:
\d{3}-\d{3}-\d{4}
If parentheses are added to the regex, each set of parentheses is considered a capturing group. In this case, we are using what are called numbered capture groups:
(\d{3})-(\d{3})-(\d{4})
^-----^ ^-----^ ^-----^
Group 1 Group 2 Group 3
Before we can use it in Java, we must not forget to follow the rules of Strings, escaping the backslashes, resulting in the following pattern:
"(\\d{3})-(\\d{3})-(\\d{4})"
We first need to compile the regex pattern to make a Pattern
and then we need a Matcher
to match our input string with the pattern:
Pattern phonePattern = Pattern.compile("(\\d{3})-(\\d{3})-(\\d{4})");
Matcher phoneMatcher = phonePattern.matcher("abcd800-555-1234wxyz");
Next, the Matcher needs to find the first subsequence that matches the regex:
phoneMatcher.find();
Now, using the group method, we can extract the data from the string:
String number = phoneMatcher.group(0); //"800-555-1234" (Group 0 is everything the regex matched)
String aCode = phoneMatcher.group(1); //"800"
String threeDigit = phoneMatcher.group(2); //"555"
String fourDigit = phoneMatcher.group(3); //"1234"
Note: Matcher.group()
can be used in place of Matcher.group(0)
.
Java 7 introduced named capture groups. Named capture groups function the same as numbered capture groups (but with a name instead of a number), although there are slight syntax changes. Using named capture groups improves readability.
We can alter the above code to use named groups:
(?<AreaCode>\d{3})-(\d{3})-(\d{4})
^----------------^ ^-----^ ^-----^
AreaCode Group 2 Group 3
To get the contents of "AreaCode", we can instead use:
String aCode = phoneMatcher.group("AreaCode"); //"800"