Python Language Scraping using Selenium WebDriver


Example

Some websites don’t like to be scraped. In these cases you may need to simulate a real user working with a browser. Selenium launches and controls a web browser.

from selenium import webdriver

browser = webdriver.Firefox()  # launch firefox browser

browser.get('http://stackoverflow.com/questions?sort=votes')  # load url

title = browser.find_element_by_css_selector('h1').text  # page title (first h1 element)

questions = browser.find_elements_by_css_selector('.question-summary')  # question list

for question in questions:  # iterate over questions
    question_title = question.find_element_by_css_selector('.summary h3 a').text
    question_excerpt = question.find_element_by_css_selector('.summary .excerpt').text
    question_vote = question.find_element_by_css_selector('.stats .vote .votes .vote-count-post').text
    
    print "%s\n%s\n%s votes\n-----------\n" % (question_title, question_excerpt, question_vote) 

Selenium can do much more. It can modify browser’s cookies, fill in forms, simulate mouse clicks, take screenshots of web pages, and run custom JavaScript.