Saturday, April 29, 2017
# Subsetting

## Introduction

Given an R object, we may require separate analysis for one or more parts of the data contained in it. The process of obtaining these parts of the data from a given object is called `subsetting`.

## Remarks

Missing values:

Missing values (`NA`s) used in subsetting with `[` return `NA` since a `NA` index

picks an unknown element and so returns NA in the corresponding element..

The "default" type of `NA` is "logical" (`typeof(NA)`) which means that, as any "logical" vector used in subsetting, will be recycled to match the length of the subsetted object. So `x[NA]` is equivalent to `x[as.logical(NA)]` which is equivalent to `x[rep_len(as.logical(NA), length(x))]` and, consequently, it returns a missing value (`NA`) for each element of `x`. As an example:

``````x <- 1:3
x[NA]
## [1] NA NA NA
``````

While indexing with "numeric"/"integer" `NA` picks a single `NA` element (for each `NA` in index):

``````x[as.integer(NA)]
## [1] NA

x[c(NA, 1, NA, NA)]
## [1] NA  1 NA NA
``````

Subsetting out of bounds:

The `[` operator, with one argument passed, allows indices that are `> length(x)` and returns `NA` for atomic vectors or `NULL` for generic vectors. In contrast, with `[[` and when `[` is passed more arguments (i.e. subsetting out of bounds objects with `length(dim(x)) > 2`) an error is returned:

``````(1:3)[10]
## [1] NA
(1:3)[[10]]
## Error in (1:3)[[10]] : subscript out of bounds
as.matrix(1:3)[10]
## [1] NA
as.matrix(1:3)[, 10]
## Error in as.matrix(1:3)[, 10] : subscript out of bounds
list(1, 2, 3)[10]
## [[1]]
## NULL
list(1, 2, 3)[[10]]
## Error in list(1, 2, 3)[[10]] : subscript out of bounds
``````

The behaviour is the same when subsetting with "character" vectors, that are not matched in the "names" attribute of the object, too:

``````c(a = 1, b = 2)["c"]
## <NA>
##   NA
list(a = 1, b = 2)["c"]
## <NA>
## NULL
``````

Help topics:

See `?Extract` for further information.