Grouping is done with parentheses. Calling
group() returns a string formed of the matching parenthesized subgroups.
match.group() # Group without argument returns the entire match found # Out: '123' match.group(0) # Specifying 0 gives the same result as specifying no argument # Out: '123'
Arguments can also be provided to
group() to fetch a particular subgroup.
From the docs:
If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument.
groups() on the other hand, returns a list of tuples containing the subgroups.
sentence = "This is a phone number 672-123-456-9910" pattern = r".*(phone).*?([\d-]+)" match = re.match(pattern, sentence) match.groups() # The entire match as a list of tuples of the paranthesized subgroups # Out: ('phone', '672-123-456-9910') m.group() # The entire match as a string # Out: 'This is a phone number 672-123-456-9910' m.group(0) # The entire match as a string # Out: 'This is a phone number 672-123-456-9910' m.group(1) # The first parenthesized subgroup. # Out: 'phone' m.group(2) # The second parenthesized subgroup. # Out: '672-123-456-9910' m.group(1, 2) # Multiple arguments give us a tuple. # Out: ('phone', '672-123-456-9910')
match = re.search(r'My name is (?P<name>[A-Za-z ]+)', 'My name is John Smith') match.group('name') # Out: 'John Smith' match.group(1) # Out: 'John Smith'
Creates a capture group that can be referenced by name as well as by index.
(?:) creates a group, but the group isn't captured. This means you can use it as a group, but it won't pollute your "group space".
re.match(r'(\d+)(\+(\d+))?', '11+22').groups() # Out: ('11', '+22', '22') re.match(r'(\d+)(?:\+(\d+))?', '11+22').groups() # Out: ('11', '22')
This example matches
11, but not
11+. This is since the
+ sign and the second term are grouped. On the other hand, the
+ sign isn't captured.