nltk Tutorial => Filtering out stop words

Example

NLTK has by default a bunch of words that it considers to be stop words. It can be accessed via the NLTK corpus with:

from nltk.corpus import stopwords

To check the list of stop words stored for english language :

stop_words = set(stopwords.words("english"))
print(stop_words)

Example to incorporate the stop_words set to remove the stop words from a given text:

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

example_sent = "This is a sample sentence, showing off the stop words filtration."
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(example_sent)
filtered_sentence = [w for w in word_tokens if not w in stop_words]

filtered_sentence = []

for w in word_tokens:
    if w not in stop_words:
        filtered_sentence.append(w)
    
print(word_tokens)
print(filtered_sentence)

PDF - Download nltk for free

Previous Next

nltk

Fastest Entity Framework Extensions

Example

Got any nltk Question?

nltk

nltk Stop Words Filtering out stop words

Fastest Entity Framework Extensions

Example

Got any nltk Question?