Sometimes you doesn't want to simply replace or remove the string. Sometimes you want to extract and process matches. Here an example of how you manipulate matches.
What is a match ? When a compatible substring is found for the entire regex in the string, the exec command produce a match. A match is an array compose by firstly the whole substring that matched and all the parenthesis in the match.
Imagine a html string :
<html>
<head></head>
<body>
<h1>Example</h1>
<p>Look a this great link : <a href="https://stackoverflow.com">Stackoverflow</a> http://anotherlinkoutsideatag</p>
Copyright <a href="https://stackoverflow.com">Stackoverflow</a>
</body>
You want to extract and get all the links inside an a
tag. At first, here the regex you write :
var re = /<a[^>]*href="https?:\/\/.*"[^>]*>[^<]*<\/a>/g;
But now, imagine you want the href
and the anchor
of each link. And you want it together.
You can simply add a new regex in for each match OR you can use parentheses :
var re = /<a[^>]*href="(https?:\/\/.*)"[^>]*>([^<]*)<\/a>/g;
var str = '<html>\n <head></head>\n <body>\n <h1>Example</h1>\n <p>Look a this great link : <a href="https://stackoverflow.com">Stackoverflow</a> http://anotherlinkoutsideatag</p>\n\n Copyright <a href="https://stackoverflow.com">Stackoverflow</a>\n </body>\';\n';
var m;
var links = [];
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
console.log(m[0]); // The all substring
console.log(m[1]); // The href subpart
console.log(m[2]); // The anchor subpart
links.push({
match : m[0], // the entire match
href : m[1], // the first parenthesis => (https?:\/\/.*)
anchor : m[2], // the second one => ([^<]*)
});
}
At the end of the loop, you have an array of link with anchor
and href
and you can use it to write markdown for example :
links.forEach(function(link) {
console.log('[%s](%s)', link.anchor, link.href);
});
To go further :