Let's assume you got an html
after selecting with soup.find('div', class_='base class')
:
from bs4 import BeautifulSoup
soup = BeautifulSoup(SomePage, 'lxml')
html = soup.find('div', class_='base class')
print(html)
<div class="base class">
<div>Sample text 1</div>
<div>Sample text 2</div>
<div>
<a class="ordinary link" href="https://example.com">URL text</a>
</div>
</div>
<div class="Confusing class"></div>
'''
And if you want to access <a>
tag's href
, you can do it this way:
a_tag = html.a
link = a_tag['href']
print(link)
https://example.com
This is useful when you can't directly select <a>
tag because it's attrs
don't give you unique identification, there are other "twin" <a>
tags in parsed page. But you can uniquely select a parent tag which contains needed <a>
.