Recoding of variables in simple regression


In my statistical book, I came across recoding the variables in simple regression. I know how to record variables in order to have the values of 0 and 1. However, in the book there is written that before recording, there is a need to execute this command:

dataset$biology <- NA.

I cannot get what this NA means in this particular command, I know that it should refer to missing values but what is the exact meaning here?


NA stands for Not Available and what is doing is filling the entire biology variable with "null values", I suppose that the intention behind this is to preallocate memory and make code run faster.

dataset <- data.frame(x = 1:10)
dataset$biology <- NA
#>     x biology
#> 1   1      NA
#> 2   2      NA
#> 3   3      NA
#> 4   4      NA
#> 5   5      NA
#> 6   6      NA
#> 7   7      NA
#> 8   8      NA
#> 9   9      NA
#> 10 10      NA

