The regexp
command is used to match a regular expression against a string.
# This is a very simplistic e-mail matcher.
# e-mail addresses are extremely complicated to match properly.
# there is no guarantee that this regex will properly match e-mail addresses.
set mydata "send mail to [email protected] please"
regexp -expanded {
\y # word boundary
[^@\s]+ # characters that are not an @ or a space character
@ # a single @ sign
[\w.-]+ # normal characters and dots and dash
\. # a dot character
\w+ # normal characters.
\y # word boundary
} $mydata emailaddr
puts $emailaddr
[email protected]
The regexp
command will return a 1 (true) value if a match was made or 0 (false) if not.
set mydata "hello wrld, this is Tcl"
# faster would be to use: [string match *world* $mydata]
if { [regexp {world} $mydata] } {
puts "spelling correct"
} else {
puts "typographical error"
}
To match all expressions in some data, use the -all switch and the -inline switch to return the data. Note that the default is to treat newlines like any other data.
# simplistic english ordinal word matcher.
set mydata {
This is the first line.
This is the second line.
This is the third line.
This is the fourth line.
}
set mymatches [regexp -all -inline -expanded {
\y # word boundary
\w+ # standard characters
(?:st|nd|rd|th) # ending in st, nd, rd or th
# The ?: operator is used here as we don't
# want to return the match specified inside
# the grouping () operator.
\y # word boundary
} $mydata]
puts $mymatches
first second third fourth
# if the ?: operator was not used, the data returned would be:
first st second nd third rd fourth th
Newline handling
# find real numbers at the end of a line (fake data).
set mydata {
White 0.87 percent saturation.
Specular reflection: 0.995
Blue 0.56 percent saturation.
Specular reflection: 0.421
}
# the -line switch will enable newline matching.
# without -line, the $ would match the end of the data.
set mymatches [regexp -line -all -inline -expanded {
\y # word boundary
\d\.\d+ # a real number
$ # at the end of a line.
} $mydata]
puts $mymatches
0.995 0.421
Unicode requires no special handling.
% set mydata {123ÂÃÄÈ456}
123ÂÃÄÈ456
% regexp {[[:alpha:]]+} $mydata match
1
% puts $match
ÂÃÄÈ
% regexp {\w+} $mydata match
1
% puts $match
123ÂÃÄÈ456
Documentation: regexp re_syntax