Julia Language Regexes Capture groups


The substrings captured by capture groups are accessible from RegexMatch objects using indexing notation.

For instance, the following regex parses North American phone numbers written in (555)-555-5555 format:

julia> phone = r"\((\d{3})\)-(\d{3})-(\d{4})"

and suppose we wish to extract the phone numbers from a text:

julia> text = """
       My phone number is (555)-505-1000.
       Her phone number is (555)-999-9999.
"My phone number is (555)-505-1000.\nHer phone number is (555)-999-9999.\n"

Using the matchall function, we can get an array of the substrings matched themselves:

julia> matchall(phone, text)
2-element Array{SubString{String},1}:

But suppose we want to access the area codes (the first three digits, enclosed in brackets). Then we can use the eachmatch iterator:

julia> for m in eachmatch(phone, text)
           println("Matched $(m.match) with area code $(m[1])")
Matched (555)-505-1000 with area code 555
Matched (555)-999-9999 with area code 555

Note here that we use m[1] because the area code is the first capture group in our regular expression. We can get all three components of the phone number as a tuple using a function:

julia> splitmatch(m) = m[1], m[2], m[3]
splitmatch (generic function with 1 method)

Then we can apply such a function to a particular RegexMatch:

julia> splitmatch(match(phone, text))

Or we could map it across each match:

julia> map(splitmatch, eachmatch(phone, text))
2-element Array{Tuple{SubString{String},SubString{String},SubString{String}},1}: