Tutorial by Examples | RIP Tutorial

Locate a text after an element in BeautifulSoup

Imagine you have the following HTML: <div> <label>Name:</label> John Smith </div> And you need to locate the text "John Smith" after the label element. In this case, you can locate the label element by text and then use .next_sibling property: from ...

beautifulsoup • Locating elements

Using CSS selectors to locate elements in BeautifulSoup

BeautifulSoup has a limited support for CSS selectors, but covers most commonly used ones. Use select() method to find multiple elements and select_one() to find a single element. Basic example: from bs4 import BeautifulSoup data = """ <ul> <li class="item&quo...

beautifulsoup • Locating elements

Locating comments

To locate comments in BeautifulSoup, use the text (or string in the recent versions) argument checking the type to be Comment: from bs4 import BeautifulSoup from bs4 import Comment data = """ <html> <body> <div> <!-- desired text --&gt...

beautifulsoup • Locating elements

Filter functions

BeautifulSoup allows you to filter results by providing a function to find_all and similar functions. This can be useful for complex filters as well as a tool for code reuse. Basic usage Define a function that takes an element as its only argument. The function should return True if the argument m...

beautifulsoup • Locating elements

Accessing internal tags and their attributes of initially selected tag

Let's assume you got an html after selecting with soup.find('div', class_='base class'): from bs4 import BeautifulSoup soup = BeautifulSoup(SomePage, 'lxml') html = soup.find('div', class_='base class') print(html) <div class="base class"> <div>Sample text 1</div...

beautifulsoup • Locating elements

Collecting optional elements and/or their attributes from series of pages

Let's consider situation when you parse number of pages and you want to collect value from element that's optional (can be presented on one page and can be absent on another) for a paticular page. Moreover the element itself, for example, is the most ordinary element on page, in other words no spec...

beautifulsoup • Locating elements