R LanguageWeb scraping and parsing

Download R Language for free


Scraping refers to using a computer to retrieve the code of a webpage. Once the code is obtained, it must be parsed into a useful form for further use in R.

Base R does not have many of the tools required for these processes, so scraping and parsing are typically done with packages. Some packages are most useful for scraping (RSelenium, httr, curl, RCurl), some for parsing (XML, xml2), and some for both (rvest).

A related process is scraping a web API, which unlike a webpage returns data intended to be machine-readable. Many of the same packages are used for both.


Some websites object to being scraped, whether due to increased server loads or concerns about data ownership. If a website forbids scraping in it Terms of Use, scraping it is illegal.

Related Examples

Basic scraping with rvest