It refers to the splitting of sentences and words from the body of text into sentence tokens or word tokens respectively. It is an essential part of NLP, as many modules work better (or only) with tags. For example, pos_tag needs tags as input and not the words, to tag them by parts of speech.