R Language Web scraping and parsing

30% OFF - 9th Anniversary discount on Entity Framework Extensions until December 15 with code: ZZZANNIVERSARY9


Scraping refers to using a computer to retrieve the code of a webpage. Once the code is obtained, it must be parsed into a useful form for further use in R.

Base R does not have many of the tools required for these processes, so scraping and parsing are typically done with packages. Some packages are most useful for scraping (RSelenium, httr, curl, RCurl), some for parsing (XML, xml2), and some for both (rvest).

A related process is scraping a web API, which unlike a webpage returns data intended to be machine-readable. Many of the same packages are used for both.


Some websites object to being scraped, whether due to increased server loads or concerns about data ownership. If a website forbids scraping in it Terms of Use, scraping it is illegal.

Got any R Language Question?