The substrings captured by capture groups are accessible from RegexMatch
objects using indexing notation.
For instance, the following regex parses North American phone numbers written in (555)-555-5555
format:
julia> phone = r"\((\d{3})\)-(\d{3})-(\d{4})"
and suppose we wish to extract the phone numbers from a text:
julia> text = """
My phone number is (555)-505-1000.
Her phone number is (555)-999-9999.
"""
"My phone number is (555)-505-1000.\nHer phone number is (555)-999-9999.\n"
Using the matchall
function, we can get an array of the substrings matched themselves:
julia> matchall(phone, text)
2-element Array{SubString{String},1}:
"(555)-505-1000"
"(555)-999-9999"
But suppose we want to access the area codes (the first three digits, enclosed in brackets). Then we can use the eachmatch
iterator:
julia> for m in eachmatch(phone, text)
println("Matched $(m.match) with area code $(m[1])")
end
Matched (555)-505-1000 with area code 555
Matched (555)-999-9999 with area code 555
Note here that we use m[1]
because the area code is the first capture group in our regular expression. We can get all three components of the phone number as a tuple using a function:
julia> splitmatch(m) = m[1], m[2], m[3]
splitmatch (generic function with 1 method)
Then we can apply such a function to a particular RegexMatch
:
julia> splitmatch(match(phone, text))
("555","505","1000")
Or we could map
it across each match:
julia> map(splitmatch, eachmatch(phone, text))
2-element Array{Tuple{SubString{String},SubString{String},SubString{String}},1}:
("555","505","1000")
("555","999","9999")