Given an R object, we may require separate analysis for one or more parts of the data contained in it. The process of obtaining these parts of the data from a given object is called subsetting
.
Missing values:
Missing values (NA
s) used in subsetting with [
return NA
since a NA
index
picks an unknown element and so returns NA in the corresponding element..
The "default" type of NA
is "logical" (typeof(NA)
) which means that, as any "logical" vector used in subsetting, will be recycled to match the length of the subsetted object. So x[NA]
is equivalent to x[as.logical(NA)]
which is equivalent to x[rep_len(as.logical(NA), length(x))]
and, consequently, it returns a missing value (NA
) for each element of x
. As an example:
x <- 1:3
x[NA]
## [1] NA NA NA
While indexing with "numeric"/"integer" NA
picks a single NA
element (for each NA
in index):
x[as.integer(NA)]
## [1] NA
x[c(NA, 1, NA, NA)]
## [1] NA 1 NA NA
Subsetting out of bounds:
The [
operator, with one argument passed, allows indices that are > length(x)
and returns NA
for atomic vectors or NULL
for generic vectors. In contrast, with [[
and when [
is passed more arguments (i.e. subsetting out of bounds objects with length(dim(x)) > 2
) an error is returned:
(1:3)[10]
## [1] NA
(1:3)[[10]]
## Error in (1:3)[[10]] : subscript out of bounds
as.matrix(1:3)[10]
## [1] NA
as.matrix(1:3)[, 10]
## Error in as.matrix(1:3)[, 10] : subscript out of bounds
list(1, 2, 3)[10]
## [[1]]
## NULL
list(1, 2, 3)[[10]]
## Error in list(1, 2, 3)[[10]] : subscript out of bounds
The behaviour is the same when subsetting with "character" vectors, that are not matched in the "names" attribute of the object, too:
c(a = 1, b = 2)["c"]
## <NA>
## NA
list(a = 1, b = 2)["c"]
## <NA>
## NULL
Help topics:
See ?Extract
for further information.