R Language Tutorial => Find matches in big data sets

Example

In case of big data sets, the call of grepl("fox", test_sentences) does not perform well. Big data sets are e.g. crawled websites or million of Tweets, etc.

The first acceleration is the usage of the perl = TRUE option. Even faster is the option fixed = TRUE. A complete example would be:

# example data
test_sentences <- c("The quick brown fox", "jumps over the lazy dog")

grepl("fox", test_sentences, perl = TRUE)
#[1]  TRUE FALSE

In case of text mining, often a corpus gets used. A corpus cannot be used directly with grepl. Therefore, consider this function:

searchCorpus <- function(corpus, pattern) {
  return(tm_index(corpus, FUN = function(x) {
    grepl(pattern, x, ignore.case = TRUE, perl = TRUE)
  }))
}

PDF - Download R Language for free

Previous Next

R Language

Fastest Entity Framework Extensions

Example

Got any R Language Question?

R Language

R Language Pattern Matching and Replacement Find matches in big data sets

Fastest Entity Framework Extensions

Example

Got any R Language Question?