Regular Expressions Parsing HTML (or XML, or JSON, or C code, or…)


If you want to extract something from a webpage (or any representation/programming language), a regex is the wrong tool for the task. You should instead use your language's libraries to achieve the task.

If you want to read HTML, or XML, or JSON, just use the library that parses it properly and serves it as usable objects in your favorite language! You'll end up with readable and more maintainable code, and you won't end up