Please how do I convert the categorical values in column 2, 3 and 4 to numeric? I have tried to use transform () and I didn't get what I want.
Let's say your data frame is called DF. You can see whether the columns are characters or factors with the command
str(DF)
To convert characters into integers, ordering them alphabetically, you can use
DF$proto <- as.numeric(as.factor(DF$proto))
If they are already factors, you can skip using the as.factor function.
Does that get you what you want?
perfect but is there a way to do the conversion at once than doing it separately for each column since I have 3 columns?
There are a few ways to do this. If you want to change particular columns, you can use the mutate_at function from dplyr and indicate the columns by their numeric position or by name.
DF <- data.frame(A = 1:3, B = c("D", "F", "W"), C = c("U", "I", "U"),
D = c("Q", "W", "E"), E = 3:5, stringsAsFactors = FALSE)
str(DF)
#> 'data.frame': 3 obs. of 5 variables:
#> $ A: int 1 2 3
#> $ B: chr "D" "F" "W"
#> $ C: chr "U" "I" "U"
#> $ D: chr "Q" "W" "E"
#> $ E: int 3 4 5
library(dplyr)
MakeNum <- function(x) as.numeric(as.factor(x))
DF <- mutate_at(DF, 2:4, MakeNum)
str(DF)
#> 'data.frame': 3 obs. of 5 variables:
#> $ A: int 1 2 3
#> $ B: num 1 2 3
#> $ C: num 2 1 2
#> $ D: num 2 3 1
#> $ E: int 3 4 5
Created on 2020-02-27 by the reprex package (v0.3.0)
You can also use mutate_if() to affect all columns that meet a certain condition, such as all character columns
Just to add the new kid on the block, you can will also be able to do this with across()
in the forthcoming release of dplyr:
If you have the development version installed, you can run:
library(dplyr)
DF <- data.frame(A = 1:3, B = c("D", "F", "W"), C = c("U", "I", "U"),
D = c("Q", "W", "E"), E = 3:5, stringsAsFactors = FALSE)
str(DF)
#> 'data.frame': 3 obs. of 5 variables:
#> $ A: int 1 2 3
#> $ B: chr "D" "F" "W"
#> $ C: chr "U" "I" "U"
#> $ D: chr "Q" "W" "E"
#> $ E: int 3 4 5
MakeNum <- function(x) as.numeric(as.factor(x))
DF <- mutate(DF, across(2:4, MakeNum))
str(DF)
#> 'data.frame': 3 obs. of 5 variables:
#> $ A: int 1 2 3
#> $ B: num 1 2 3
#> $ C: num 2 1 2
#> $ D: num 2 3 1
#> $ E: int 3 4 5
Created on 2020-02-27 by the reprex package (v0.3.0.9001)
Edit Now reflects that across is only in the dev version of dplyr.
I'm curious about across()
-- what package does it come from?
dplyr, see the link at the bottom of my response, above
Sorry, I didn't register the link somehow! The reason I had asked was that I had assumed it was either base
R
or dplyr
from the code, but got no documentation from `?across, so I must already be out of date -- things change so quickly!
It hasn't been released yet, I probably should have mentioned that.
No worries -- I'll have to remember to look to you for news about the latest
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.