# R Language Omitting or replacing missing values

## Recoding missing values

Regularly, missing data isn't coded as `NA` in datasets. In SPSS for example, missing values are often represented by the value `99`.

``````num.vec <- c(1, 2, 3, 99, 5)
num.vec
##   1  2  3 99  5
``````

It is possible to directly assign NA using subsetting

``````num.vec[num.vec == 99] <- NA
``````

However, the preferred method is to use `is.na<-` as below. The help file (`?is.na`) states:

`is.na<-` may provide a safer way to set missingness. It behaves differently for factors, for example.

``````is.na(num.vec) <- num.vec == 99
``````

Both methods return

``````num.vec
##   1  2  3 NA  5
``````

## Removing missing values

Missing values can be removed in several ways from a vector:

``````num.vec[!is.na(num.vec)]
num.vec[complete.cases(num.vec)]
na.omit(num.vec)
##  1 2 3 5
``````

## Excluding missing values from calculations

When using arithmetic functions on vectors with missing values, a missing value will be returned:

``````mean(num.vec) # returns:  NA
``````

The `na.rm` parameter tells the function to exclude the `NA` values from the calculation:

``````mean(num.vec, na.rm = TRUE) # returns:  2.75

# an alternative to using 'na.rm = TRUE':
mean(num.vec[!is.na(num.vec)]) # returns:  2.75
``````

Some R functions, like `lm`, have a `na.action` parameter. The default-value for this is `na.omit`, but with `options(na.action = 'na.exclude')` the default behavior of R can be changed.

If it is not necessary to change the default behavior, but for a specific situation another `na.action` is needed, the `na.action` parameter needs to be included in the function call, e.g.:

`````` lm(y2 ~ y1, data = anscombe, na.action = 'na.exclude')
