How to convert multiple factor variable to numeric in R?

I have a data frame and most of the variables are factors. I want to convert them to numeric.
I have applied this code:

df[] <- lapply(df, function(x) as.numeric(as.character(x)))

But it made some changes in the values.

Here is the data:


data.frame':	447195 obs. of  22 variables:
 $ WEIGHT    : num  10 10 8.75 8.75 8.75 8.75 8.75 8.75 8.75 10 ...
 $ URBAN_RURA: int  2 2 2 2 2 2 2 2 2 2 ...
 $ SEX       : Factor w/ 2 levels "1","2": 1 2 1 2 2 1 1 1 2 1 ...
 $ AGE       : Factor w/ 99 levels "0","1","10","11",..: 75 67 41 25 4 68 24 2 47 49 ...
 $ BIRTHPROVI: Factor w/ 34 levels "11","12","13",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ BIRTHDISTR: Factor w/ 38 levels "1","10","11",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ PROV5     : Factor w/ 33 levels "11","12","13",..: 1 1 1 1 1 1 33 33 1 1 ...
 $ DISTRICT5 : Factor w/ 37 levels "1","10","11",..: 1 1 1 1 1 1 NA NA 1 1 ...
 $ SPEAK_INDO: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 NA NA 1 1 ...
 $ EDUCATION : Factor w/ 10 levels "0","1","2","3",..: 2 2 5 2 2 2 NA NA 2 3 ...
 $ LATIN_LITE: Factor w/ 2 levels "1","2": 1 2 1 1 1 1 NA NA 2 1 ...
 $ NUMBER    : int  1 1 30 30 30 30 30 30 30 31 ...
 $ DATEBORN  : Factor w/ 31 levels "1","10","11",..: 1 1 2 20 12 17 14 4 4 18 ...
 $ MONTHBORN : Factor w/ 12 levels "1","10","11",..: 10 10 2 10 9 3 5 5 5 7 ...
 $ YEARBORN  : Factor w/ 99 levels "1912","1913",..: 22 29 54 68 87 91 96 98 49 47 ...
 $ PROVINCE  : Factor w/ 1 level "11": 1 1 1 1 1 1 1 1 1 1 ...
 $ DISTRICT  : Factor w/ 23 levels "01","02","03",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ SUB_DISTRI: Factor w/ 60 levels "010","011","012",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ VILLAGE   : Factor w/ 122 levels "001","002","003",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ INDUSTRY  : Factor w/ 19 levels "1","10","11",..: NA NA 1 NA NA NA NA NA NA 14 ...
 $ PERSNUM   : Factor w/ 320 levels "1","10","100",..: 1 112 1 112 223 255 266 277 288 1 ...
 $ RELAT     : Factor w/ 10 levels "0","1","2","3",..: 2 3 2 3 4 4 4 4 8 2 ...

Thanks for helping me.

Which columns are not working as expected? If I make a small data set, all of the changes using your code are just what I expect.

> DF <- data.frame(Weight = c(8.75,10,7.65,11.23),
+                  Age = c("1","10","2","11"),
+                  SUB = c("010", "011", "013", "014"),stringsAsFactors = TRUE)
> 
> DF[] <- lapply(DF, function(x) as.numeric(as.character(x)))
> DF
  Weight Age SUB
1   8.75   1  10
2  10.00  10  11
3   7.65   2  13
4  11.23  11  14

There are three data frames each of them has more than 23.6 million rows and around 40 columns. When I converted the factor to numeric the missing values (NA) made problems. I got a warning.

If you know that there are NAs in the data, the warning is not a problem. Warnings are different from errors. A warning alerts you about something you might be concerned about. If you understand the origin of the warning, it is not a problem. An error is always a problem. In the example below, I put NAs in the data to produce the warning. You can see that non-NA values are all correct.

DF <- data.frame(Weight = c(8.75,10,NA,11.23),
                                    Age = c("1","10","NA","11"),
                                    SUB = c("010", NA, "013", "014"),stringsAsFactors = TRUE)
DF[] <- lapply(DF, function(x) as.numeric(as.character(x)))
Warning message:
In FUN(X[[i]], ...) : NAs introduced by coercion
DF
  Weight Age SUB
1   8.75   1  10
2  10.00  10  NA
3     NA  NA  13
4  11.23  11  14

@FJCC Thank you for your explanation.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.