beautifulsoup Filter functions


BeautifulSoup allows you to filter results by providing a function to find_all and similar functions. This can be useful for complex filters as well as a tool for code reuse.

Basic usage

Define a function that takes an element as its only argument. The function should return True if the argument matches.

def has_href(tag):
    '''Returns True for tags with a href attribute'''
    return  bool(tag.get("href"))

soup.find_all(has_href) #find all elements with a href attribute
#equivilent using lambda:
soup.find_all(lambda tag: bool(tag.get("href")))

Another example that finds tags with a href value that do not start with

Providing additional arguments to filter functions

Since the function passed to find_all can only take one argument, it's sometimes useful to make 'function factories' that produce functions fit for use in find_all. This is useful for making your tag-finding functions more flexible.

def present_in_href(check_string):
    return lambda tag: tag.get("href") and check_string in tag.get("href")