Python Language Regular Expressions (Regex) Grouping


Grouping is done with parentheses. Calling group() returns a string formed of the matching parenthesized subgroups. # Group without argument returns the entire match found
# Out: '123' # Specifying 0 gives the same result as specifying no argument
# Out: '123'

Arguments can also be provided to group() to fetch a particular subgroup.

From the docs:

If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument.

Calling groups() on the other hand, returns a list of tuples containing the subgroups.

sentence = "This is a phone number 672-123-456-9910"
pattern = r".*(phone).*?([\d-]+)"

match = re.match(pattern, sentence)

match.groups()   # The entire match as a list of tuples of the paranthesized subgroups
# Out: ('phone', '672-123-456-9910')        # The entire match as a string
# Out: 'This is a phone number 672-123-456-9910'       # The entire match as a string
# Out: 'This is a phone number 672-123-456-9910'       # The first parenthesized subgroup.
# Out: 'phone'       # The second parenthesized subgroup.
# Out: '672-123-456-9910', 2)    # Multiple arguments give us a tuple.
# Out: ('phone', '672-123-456-9910')

Named groups

match ='My name is (?P<name>[A-Za-z ]+)', 'My name is John Smith')'name')
# Out: 'John Smith'
# Out: 'John Smith'

Creates a capture group that can be referenced by name as well as by index.

Non-capturing groups

Using (?:) creates a group, but the group isn't captured. This means you can use it as a group, but it won't pollute your "group space".

re.match(r'(\d+)(\+(\d+))?', '11+22').groups()
# Out: ('11', '+22', '22')

re.match(r'(\d+)(?:\+(\d+))?', '11+22').groups()
# Out: ('11', '22')

This example matches 11+22 or 11, but not 11+. This is since the + sign and the second term are grouped. On the other hand, the + sign isn't captured.