Let's consider situation when you parse number of pages and you want to collect value from element that's optional (can be presented on one page and can be absent on another) for a paticular page.
Moreover the element itself, for example, is the most ordinary element on page, in other words no specific attributes can uniquely locate it. But you see that you can properly select its parent element and you know wanted element's order number in the respective nesting level.
from bs4 import BeautifulSoup soup = BeautifulSoup(SomePage, 'lxml') html = soup.find('div', class_='base class') # Below it refers to html_1 and html_2
Wanted element is optional, so there could be 2 situations for
html to be:
html_1 = ''' <div class="base class"> # №0 <div>Sample text 1</div> # №1 <div>Sample text 2</div> # №2 <div>!Needed text!</div> # №3 </div> <div>Confusing div text</div> # №4 ''' html_2 = ''' <div class="base class"> # №0 <div>Sample text 1</div> # №1 <div>Sample text 2</div> # №2 </div> <div>Confusing div text</div> # №4 '''
If you got
html_1 you can collect
!Needed text! from tag №3 this way:
wanted tag = html_1.div.find_next_sibling().find_next_sibling() # this gives you whole tag №3
It initially gets №1
div, then 2 times switches to next
div on same nesting level to get to №3.
wanted_text = wanted_tag.text # extracting !Needed text!
Usefulness of this approach comes when you get
html_2 - approach won't give you error, it will give
find_next_sibling() here is crucial because it limits element search by respective nesting level. If you'd use
find_next() then tag №4 will be collected and you don't want it:
print(html_2.div.find_next().find_next()) <div>Confusing div text</div>
You also can explore
find_previous() which work straight opposite way.
All described functions have their miltiple variants to catch all tags, not just the first one:
find_next_siblings() find_previous_siblings() find_all_next() find_all_previous()