How to convert factor data to numeric for all dataset

Hello, I have recently started working in RStudio, I have a big dataset and I realised that some columns in my dataset are factor by using these commands:
sapply(data, mode)
sapply(data, class). Because of the fact that it includes factor variables, I could not run some analysis. Thus I would like to know whether it is possible to convert it all to numeric by a command. I saw this link and tried to use the commands but unfortunately, I could not run them.
link: https://stackoverflow.com/questions/26391921/how-to-convert-entire-dataframe-to-numeric-while-preserving-decimals/53121940#53121940

So I would really appreciate commands on this issue.
Thanks

The easiest way is to read your data that no column is converted into factors. Use read_* functions from the readr package. If you like to stick to base functions, ie. read.* use the statement stringsAsFactors = FALSE.

By the way. It would be help full if you provide a reprex or if you at least show code snippet...

dd <- read.table(<path_to_your_data>, sep = "\t", stringsAsFactors = FALSE)

In general, I definitely concur with @felberr's advice to avoid factors when you read in data if you don't need them. But if you happen to have factors in your data that you need to convert, just remember they are a bit different than other data types in R. In particular, when you convert factors to numeric variables, you might run into to some surprising results, especially if your factors look like numeric variables. For example, this works as expected:

library(tidyverse)

one_three <- factor(1:3) %>% print()
#> [1] 1 2 3
#> Levels: 1 2 3
as.numeric(one_three)
#> [1] 1 2 3

But what if you try the same thing with this factor vector:

four_six <- factor(4:6) %>% print()
#> [1] 4 5 6
#> Levels: 4 5 6
as.numeric(four_six)
#> [1] 1 2 3

That's kind of confusing!

So, one way around this is to first convert the factor to a character vector, and then to numeric. Doing this will take the factor levels (4, 5, 6) and make them into a character vector ("4", "5", 6"). Then from there, you can convert those characters to numbers.

as.numeric(as.character(four_six))
#> [1] 4 5 6

So, all this is a long way of saying that if you want to convert all factor variables in a data frame to numeric variables, this is a pretty good way of doing it:

df <- tibble(a = one_three, b = four_six, c = c("one", "two", "three")) %>% 
  print()
#> # A tibble: 3 x 3
#>   a     b     c    
#>   <fct> <fct> <chr>
#> 1 1     4     one  
#> 2 2     5     two  
#> 3 3     6     three

mutate_if(df, is.factor, ~ as.numeric(as.character(.x)))
#> # A tibble: 3 x 3
#>       a     b c    
#>   <dbl> <dbl> <chr>
#> 1     1     4 one  
#> 2     2     5 two  
#> 3     3     6 three

The Factors chapter of R for Data Science is a pretty good place to start digging deeper into factors.

ADDING:

From the ?factor documentation:

The interpretation of a factor depends on both the codes and the "levels" attribute. Be careful only to compare factors with the same set of levels (in the same order). In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)) .

So if you have very large data frame, doing the following might be a little quicker:

mutate_if(df, is.factor, ~ as.numeric(levels(.x))[.x])
#> # A tibble: 3 x 3
#>       a     b c    
#>   <dbl> <dbl> <chr>
#> 1     1     4 one  
#> 2     2     5 two  
#> 3     3     6 three

Created on 2018-11-05 by the reprex package (v0.2.1)

5 Likes