Regular Expressions Useful Regex Showcase Match an email address


Example

Matching an email address within a string is a hard task, because the specification defining it, the RFC2822, is complex making it hard to implement as a regex. For more details why it is not a good idea to match an email with a regex, please refer to the antipattern example when not to use a regex: for matching emails. The best advice to note from that page is to use a peer reviewed and widely library in your favorite language to implement this.

Validate an email address format

When you need to rapidly validate an entry to make sure it looks like an email, the best option is to keep it simple:

^\S{1,}@\S{2,}\.\S{2,}$

That regex will check that the mail address is a non-space separated sequence of characters of length greater than one, followed by an @, followed by two sequences of non-spaces characters of length two or more separated by a .. It's not perfect, and might validate invalid addresses (according to the format), but most importantly, it's not invalidating valid addresses.

Check the address exists

The only reliable way to check that an email is valid is to check for its existence. There used to be the VRFY SMTP command that has been designed for that purpose, but sadly, after being abused by spammers it's now not available anymore.

So the only way you're left with to check that the mail is valid and exists is to actually send an e-mail to that address.

Huge Regex alternatives

Though, it's not impossible to validate an address email using a regex. The only issues is that the closer to the specification those regexes will be, the bigger they will be and as a consequency they are impossibly hard to read and maintain. Below, you'll find example of such more accurate regex that are being used in some libraries.

⚠️ The following regex are given for documentation and learning purposes, copy pasting them in your code is a bad idea. Instead, use that library directly, so you can rely on upstream code and peer developers to keep your email parsing code up to date and maintained.

Perl Address matching module

The best examples of such regex are in some languages standard libraries. For example, there's one from the RFC::RFC822::Address module in the Perl library that tries to be as accurate as possible according to the RFC. For your curiosity you can find a version of that regex at this URL, that has been generated from the grammar, and if you're tempted to copy paste it, here's quote from the regex' author:

"I do not maintain the regular expression [linked]. There may be bugs in it that have already been fixed in the Perl module."

.Net Address matching module

Another, shorter variant is the one used by the .Net standard library in the EmailAddressAttribute module:

^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$

But even if it's shorter it's still too big to be readable and easily maintainable.

Ruby Address matching module

In ruby a composition of regex are being used in the rfc822 module to match an address. This is a neat idea, as in case bugs are found, it will be easier to pinpoint the regex part to change and fix it.

Python Address matching module

As a counter example, the python email parsing module is not using a regex, but instead implements it using a parser.