[^0-9a-zA-Z]
This will match all characters that are neither numbers nor letters (alphanumerical characters). If the underscore character _
is also to be negated, the expression can be shortened to:
[^\w]
Or:
\W
In the following sentences:
Hi, what's up?
I can't wait for 2017!!!
The following characters match:
,
,,
'
,?
and the end of line character.
'
,,
!
and the end of line character.
UNICODE NOTE
Note that some flavors with Unicode character properties support may interpret \w
and \W
as [\p{L}\p{N}_]
and [^\p{L}\p{N}_]
which means other Unicode letters and numeric characters will be included as well (see PCRE docs). Here is a PCRE \w
test:
In .NET, \w
= [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Lm}\p{Mn}\p{Nd}\p{Pc}]
, and note it does not match \p{Nl}
and \p{No}
unlike PCRE (see the \w
.NET documentation):
Note that for some reason, Unicode 3.1 lowercase letters (like 𝐚𝒇𝓌𝔨𝕨𝗐𝛌𝛚
) are not matched.
Java's (?U)\w
will match a mix of what \w
matches in PCRE and .NET: