tcl Matching


Example

The regexp command is used to match a regular expression against a string.

# This is a very simplistic e-mail matcher.
# e-mail addresses are extremely complicated to match properly.
# there is no guarantee that this regex will properly match e-mail addresses.
set mydata "send mail to john.doe.the.23rd@no.such.domain.com please"
regexp -expanded {
    \y           # word boundary
    [^@\s]+      # characters that are not an @ or a space character
    @            # a single @ sign
    [\w.-]+      # normal characters and dots and dash
    \.           # a dot character
    \w+          # normal characters.
    \y           # word boundary
    } $mydata emailaddr
puts $emailaddr
john.doe.the.23rd@no.such.domain.com

The regexp command will return a 1 (true) value if a match was made or 0 (false) if not.

set mydata "hello wrld, this is Tcl"
# faster would be to use: [string match *world* $mydata] 
if { [regexp {world} $mydata] } {
   puts "spelling correct"
} else {
   puts "typographical error"
}

To match all expressions in some data, use the -all switch and the -inline switch to return the data. Note that the default is to treat newlines like any other data.

# simplistic english ordinal word matcher.
set mydata {
    This is the first line.
    This is the second line.
    This is the third line.
    This is the fourth line.
    }
set mymatches [regexp -all -inline -expanded {
    \y                  # word boundary
    \w+                 # standard characters
    (?:st|nd|rd|th)     # ending in st, nd, rd or th
                        # The ?: operator is used here as we don't
                        # want to return the match specified inside
                        # the grouping () operator.
    \y                  # word boundary
    } $mydata]
puts $mymatches
first second third fourth
# if the ?: operator was not used, the data returned would be:
first st second nd third rd fourth th

Newline handling

# find real numbers at the end of a line (fake data).
set mydata {
    White 0.87 percent saturation.
    Specular reflection: 0.995
    Blue 0.56 percent saturation.
    Specular reflection: 0.421
    }
# the -line switch will enable newline matching.
# without -line, the $ would match the end of the data.
set mymatches [regexp -line -all -inline -expanded {
    \y                  # word boundary
    \d\.\d+             # a real number
    $                   # at the end of a line.
    } $mydata]
puts $mymatches
0.995 0.421

Unicode requires no special handling.

% set mydata {123ÂÃÄÈ456}
123ÂÃÄÈ456
% regexp {[[:alpha:]]+} $mydata match
1
% puts $match
ÂÃÄÈ
% regexp {\w+} $mydata match
1
% puts $match
123ÂÃÄÈ456

Documentation: regexp re_syntax