BeautifulSoup allows you to filter results by providing a function to find_all
and similar functions. This can be useful for complex filters as well as a tool for code reuse.
Define a function that takes an element as its only argument. The function should return True
if the argument matches.
def has_href(tag):
'''Returns True for tags with a href attribute'''
return bool(tag.get("href"))
soup.find_all(has_href) #find all elements with a href attribute
#equivilent using lambda:
soup.find_all(lambda tag: bool(tag.get("href")))
Another example that finds tags with a href
value that do not start with
Since the function passed to find_all
can only take one argument, it's sometimes useful to make 'function factories' that produce functions fit for use in find_all
. This is useful for making your tag-finding functions more flexible.
def present_in_href(check_string):
return lambda tag: tag.get("href") and check_string in tag.get("href")
soup.find_all(present_in_href("/partial/path"))