# R Language Basic usage of split

## Example

`split` allows to divide a vector or a data.frame into buckets with regards to a factor/group variables. This ventilation into buckets takes the form of a list, that can then be used to apply group-wise computation (`for` loops or `lapply`/`sapply`).

First example shows the usage of `split` on a vector:

Consider following vector of letters:

``````testdata <- c("e", "o", "r", "g", "a", "y", "w", "q", "i", "s", "b", "v", "x", "h", "u")
``````

Objective is to separate those letters into `voyels` and `consonants`, ie split it accordingly to letter type.

Let's first create a grouping vector:

`````` vowels <- c('a','e','i','o','u','y')
letter_type <- ifelse(testdata %in% vowels, "vowels", "consonants")
``````

Note that `letter_type` has the same length that our vector `testdata`. Now we can `split` this test data in the two groups, `vowels` and `consonants` :

``````split(testdata, letter_type)
#\$consonants
#[1] "r" "g" "w" "q" "s" "b" "v" "x" "h"

#\$vowels
#[1] "e" "o" "a" "y" "i" "u"
``````

Hence, the result is a list which names are coming from our grouping vector/factor `letter_type`.

`split` has also a method to deal with data.frames.

Consider for instance `iris` data:

``````data(iris)
``````

By using `split`, one can create a list containing one data.frame per iris specie (variable: Species):

``````> liris <- split(iris, iris\$Species)
> names(liris)
[1] "setosa"     "versicolor" "virginica"
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
``````

(contains only data for setosa group).

One example operation would be to compute correlation matrix per iris specie; one would then use `lapply`:

``````> (lcor <- lapply(liris, FUN=function(df) cor(df[,1:4])))

\$setosa
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    1.0000000   0.7425467    0.2671758   0.2780984
Sepal.Width     0.7425467   1.0000000    0.1777000   0.2327520
Petal.Length    0.2671758   0.1777000    1.0000000   0.3316300
Petal.Width     0.2780984   0.2327520    0.3316300   1.0000000

\$versicolor
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    1.0000000   0.5259107    0.7540490   0.5464611
Sepal.Width     0.5259107   1.0000000    0.5605221   0.6639987
Petal.Length    0.7540490   0.5605221    1.0000000   0.7866681
Petal.Width     0.5464611   0.6639987    0.7866681   1.0000000

\$virginica
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    1.0000000   0.4572278    0.8642247   0.2811077
Sepal.Width     0.4572278   1.0000000    0.4010446   0.5377280
Petal.Length    0.8642247   0.4010446    1.0000000   0.3221082
Petal.Width     0.2811077   0.5377280    0.3221082   1.0000000
``````

Then we can retrieve per group the best pair of correlated variables: (correlation matrix is reshaped/melted, diagonal is filtered out and selecting best record is performed)

``````> library(reshape)
> (topcor <- lapply(lcor, FUN=function(cormat){
correlations <- melt(cormat,variable_name="correlatio);
filtered <- correlations[correlations\$X1 != correlations\$X2,];
filtered[which.max(filtered\$correlation),]
}))

\$setosa
X1           X2     correlation
2 Sepal.Width Sepal.Length       0.7425467

\$versicolor
X1           X2     correlation
12 Petal.Width Petal.Length       0.7866681

\$virginica
X1           X2     correlation
3 Petal.Length Sepal.Length       0.8642247
``````

Note that one computations are performed on such groupwise level, one may be interested in stacking the results, which can be done with:

``````> (result <- do.call("rbind", topcor))

X1           X2     correlation
setosa      Sepal.Width Sepal.Length       0.7425467
versicolor  Petal.Width Petal.Length       0.7866681
virginica  Petal.Length Sepal.Length       0.8642247
``````