Matching an email address within a string is a hard task, because the specification defining it, the RFC2822, is complex making it hard to implement as a regex. For more details why it is not a good idea to match an email with a regex, please refer to the antipattern example when not to use a regex: for matching emails. The best advice to note from that page is to use a peer reviewed and widely library in your favorite language to implement this.
When you need to rapidly validate an entry to make sure it looks like an email, the best option is to keep it simple:
^\S{1,}@\S{2,}\.\S{2,}$
That regex will check that the mail address is a non-space separated sequence
of characters of length greater than one, followed by an @
, followed by two
sequences of non-spaces characters of length two or more separated by a .
.
It's not perfect, and might validate invalid addresses (according to the
format), but most importantly, it's not invalidating valid addresses.
The only reliable way to check that an email is valid is to check for its existence.
There used to be the VRFY
SMTP command that has been designed for that purpose, but
sadly, after being abused by spammers it's now not available anymore.
So the only way you're left with to check that the mail is valid and exists is to actually send an e-mail to that address.
Though, it's not impossible to validate an address email using a regex. The only issues is that the closer to the specification those regexes will be, the bigger they will be and as a consequency they are impossibly hard to read and maintain. Below, you'll find example of such more accurate regex that are being used in some libraries.
⚠️ The following regex are given for documentation and learning purposes, copy pasting them in your code is a bad idea. Instead, use that library directly, so you can rely on upstream code and peer developers to keep your email parsing code up to date and maintained.
The best examples of such regex are in some languages standard libraries. For example,
there's one from the RFC::RFC822::Address
module
in the Perl library that tries to be as accurate as possible according to the RFC. For your
curiosity you can find a version of that regex at this URL,
that has been generated from the grammar, and if you're tempted to copy paste it,
here's quote from the regex' author:
"I do not maintain the regular expression [linked]. There may be bugs in it that have already been fixed in the Perl module."
Another, shorter variant is the one used by the .Net standard library in the
EmailAddressAttribute
module:
^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$
But even if it's shorter it's still too big to be readable and easily maintainable.
In ruby a composition of regex are being used in the rfc822 module to match an address. This is a neat idea, as in case bugs are found, it will be easier to pinpoint the regex part to change and fix it.
As a counter example, the python email parsing module is not using a regex, but instead implements it using a parser.