subset(), dummy()

I have a large dataset with some factor variables. I need all of my data to be numeric so I am converting all of the factor variables into dummy variables. I'd like to be able to create a vector of all of the factor variables so I can drop them before I make certain global calculations. I know I can do it be subsetting by column name, but am trying to figure out how to do it without having to manually type in all of the column names as there will be hundreds of them. THANKS!

#library(tidyverse)
library(fastDummies)

Data for larger data set.

df_1 <- data.frame(
categorical = c("A","B","C","A","B","A","C","C","C","A","A","C","C","C","A","C","A","B","A","C"),
indicator1 = c(1,0,1,0,1,0,0,0,0,0,0,0,0,1,1,0,1,0,0,1),
indicator2 = c(1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0),
indicator3 = c(1,0,1,0,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0),
indicator4 = c(0,1,0,0,1,0,0,0,0,0,1,1,0,1,1,0,0,0,0,0),
indicator5 = c(0,0,1,1,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,0),
continuous1 = c(2.3,3.4,6.6,5.5,6,7,11,12.3,13,5,2.4,3.6,6.3,5.2,5,6.6,11.3,12,14,5),
gender = c("M","M","F","F","F","F","F","F","M","U","U","F","M","M","F","F","F","U","M","F"))

print(df_1)
#summary(df_1)

categorical variables need to be converted to dummy variables

df_1 <- fastDummies::dummy_cols(df_1)
print(df_1)

IS THERE A BETTER WAY TO DO THIS THAN THE NEXT LINE?

drop the categorical variables that I just transformed into dummy variables

df_1 <- subset( df_1, select = -c(categorical, gender) )
print(df_1)

Would the select_if function from dplyr do what you want? You can use it to keep all numeric columns.

library(dplyr)

DF <- data.frame(Name = c("A","B", "A"), value = 1:3)
DF
#>   Name value
#> 1    A     1
#> 2    B     2
#> 3    A     3
str(DF)
#> 'data.frame':    3 obs. of  2 variables:
#>  $ Name : Factor w/ 2 levels "A","B": 1 2 1
#>  $ value: int  1 2 3
DF <- select_if(DF, is.numeric)
DF
#>   value
#> 1     1
#> 2     2
#> 3     3

Created on 2020-02-17 by the reprex package (v0.3.0)

1 Like

Thank you! That worked great:-).

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.