JavaScript Using Regex.exec() with parentheses regex to extract matches of a string


Sometimes you doesn't want to simply replace or remove the string. Sometimes you want to extract and process matches. Here an example of how you manipulate matches.

What is a match ? When a compatible substring is found for the entire regex in the string, the exec command produce a match. A match is an array compose by firstly the whole substring that matched and all the parenthesis in the match.

Imagine a html string :

  <p>Look a this great link : <a href="">Stackoverflow</a> http://anotherlinkoutsideatag</p>
  Copyright <a href="">Stackoverflow</a>

You want to extract and get all the links inside an a tag. At first, here the regex you write :

var re = /<a[^>]*href="https?:\/\/.*"[^>]*>[^<]*<\/a>/g;

But now, imagine you want the href and the anchor of each link. And you want it together. You can simply add a new regex in for each match OR you can use parentheses :

var re = /<a[^>]*href="(https?:\/\/.*)"[^>]*>([^<]*)<\/a>/g; 
var str = '<html>\n    <head></head>\n    <body>\n        <h1>Example</h1>\n        <p>Look a this great link : <a href="">Stackoverflow</a> http://anotherlinkoutsideatag</p>\n\n        Copyright <a href="">Stackoverflow</a>\n    </body>\';\n';
var m;
var links = [];

while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
    console.log(m[0]); // The all substring
    console.log(m[1]); // The href subpart
    console.log(m[2]); // The anchor subpart

      match : m[0],   // the entire match
      href : m[1],    // the first parenthesis => (https?:\/\/.*)
      anchor : m[2],  // the second one => ([^<]*)

At the end of the loop, you have an array of link with anchor and href and you can use it to write markdown for example :

links.forEach(function(link) {
  console.log('[%s](%s)', link.anchor, link.href);

To go further :

  • Nested parenthesis