Recoding of variables in simple regression


In my statistical book, I came across recoding the variables in simple regression. I know how to record variables in order to have the values of 0 and 1. However, in the book there is written that before recording, there is a need to execute this command:

dataset$biology <- NA.

I cannot get what this NA means in this particular command, I know that it should refer to missing values but what is the exact meaning here?


NA stands for Not Available and what is doing is filling the entire biology variable with "null values", I suppose that the intention behind this is to preallocate memory and make code run faster.

dataset <- data.frame(x = 1:10)
dataset$biology <- NA
#>     x biology
#> 1   1      NA
#> 2   2      NA
#> 3   3      NA
#> 4   4      NA
#> 5   5      NA
#> 6   6      NA
#> 7   7      NA
#> 8   8      NA
#> 9   9      NA
#> 10 10      NA

Created on 2019-03-31 by the reprex package (v0.2.1.9000)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.