Regular Expressions Character classes Non-alphanumerics matching (negated character class)


Example

[^0-9a-zA-Z]

This will match all characters that are neither numbers nor letters (alphanumerical characters). If the underscore character _ is also to be negated, the expression can be shortened to:

[^\w]

Or:

\W

In the following sentences:

  1. Hi, what's up?

  2. I can't wait for 2017!!!

The following characters match:

  1. ,, , ', ? and the end of line character.

  2. ', , ! and the end of line character.

UNICODE NOTE
Note that some flavors with Unicode character properties support may interpret \w and \W as [\p{L}\p{N}_] and [^\p{L}\p{N}_] which means other Unicode letters and numeric characters will be included as well (see PCRE docs). Here is a PCRE \w test:enter image description here

In .NET, \w = [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Lm}\p{Mn}\p{Nd}\p{Pc}], and note it does not match \p{Nl} and \p{No} unlike PCRE (see the \w .NET documentation):

Picture

Note that for some reason, Unicode 3.1 lowercase letters (like 𝐚𝒇𝓌𝔨𝕨𝗐𝛌𝛚) are not matched.

Java's (?U)\w will match a mix of what \w matches in PCRE and .NET:enter image description here