R Language Subsetting


Given an R object, we may require separate analysis for one or more parts of the data contained in it. The process of obtaining these parts of the data from a given object is called subsetting.


Missing values:

Missing values (NAs) used in subsetting with [ return NA since a NA index

picks an unknown element and so returns NA in the corresponding element..

The "default" type of NA is "logical" (typeof(NA)) which means that, as any "logical" vector used in subsetting, will be recycled to match the length of the subsetted object. So x[NA] is equivalent to x[as.logical(NA)] which is equivalent to x[rep_len(as.logical(NA), length(x))] and, consequently, it returns a missing value (NA) for each element of x. As an example:

x <- 1:3
## [1] NA NA NA

While indexing with "numeric"/"integer" NA picks a single NA element (for each NA in index):

## [1] NA

x[c(NA, 1, NA, NA)]
## [1] NA  1 NA NA

Subsetting out of bounds:

The [ operator, with one argument passed, allows indices that are > length(x) and returns NA for atomic vectors or NULL for generic vectors. In contrast, with [[ and when [ is passed more arguments (i.e. subsetting out of bounds objects with length(dim(x)) > 2) an error is returned:

## [1] NA
## Error in (1:3)[[10]] : subscript out of bounds
## [1] NA
as.matrix(1:3)[, 10]
## Error in as.matrix(1:3)[, 10] : subscript out of bounds
list(1, 2, 3)[10]
## [[1]]
list(1, 2, 3)[[10]]
## Error in list(1, 2, 3)[[10]] : subscript out of bounds

The behaviour is the same when subsetting with "character" vectors, that are not matched in the "names" attribute of the object, too:

c(a = 1, b = 2)["c"]
## <NA> 
##   NA 
list(a = 1, b = 2)["c"]
## <NA>

Help topics:

See ?Extract for further information.