Automatically recognizing the "correct" type of variable

swaheera · July 20, 2021, 6:50am

I am working with the R programming language. Suppose I have the following data:

#create data
var_1 = rnorm(1000,10,10)

var_2 <- c("1","0")
var_2 <- sample(var_1, 1000, replace=TRUE, prob=c(0.3, 0.7))


response<- c("2", "1","0")
response <- sample(response, 1000, replace=TRUE, prob=c(0.3, 0.4, 0.3))

my_data = data.frame(var_1, var_2, response)

my_data$var_2 = as.factor(my_data$var_2)
my_data$response = as.factor(my_data$response)

I wrote the following code that makes a histogram for the "factor" variable and a density plot for the "numerical" variable:

  #load libraries
library(ggplot2)
library(gridExtra)


#first plot
p1 = ggplot(my_data) +
    geom_histogram(aes(x=var_1, fill=response), 
                   colour="grey50", alpha=0.5, position="identity")+ ggtitle("var_2 vs response")

#second plot (for some reason, this does not look correct?)

p2 = ggplot(my_data, aes(x = var_2, fill = response)) + geom_density(alpha = 0.5) + ggtitle("var_1 vs response")

grid.arrange(p1, p2, ncol=2)

My question: Suppose I had a dataset that had many "factor" variables and "numerical" variables. Are there any functions in R that can automatically detect whether the variable is "factor" or "numerical", and then draw the corresponding graph (filled using the color of the "response variable")?

Would it have been possible to produce these graphs automatically, without manually instructing R to make the correct type of graph for each variable "type"? (e.g. suppose there were 10 variables in a dataset, would it be possible to make 10 of these graphs?)

Thanks

nirgrahamuk · July 21, 2021, 11:10am

library(glue)

myplotter <- function(var) {
  if (is.numeric(my_data[[var]])) {
    ggplot(my_data) +
      geom_histogram(aes(x = !!sym(var), fill = response),
        colour = "grey50", alpha = 0.5, position = "identity"
      ) +
      ggtitle(glue("{var} vs response"))
  }
  else {
    ggplot(my_data, aes(x = !!sym(var), fill = response)) +
      geom_density(alpha = 0.5) +
      ggtitle(glue("{var} vs response"))
  }
}

library(purrr)

results_1 <- map(c("var_1", "var_2"), myplotter)

marrangeGrob(results_1, nrow = 1, ncol = 2)

ron · July 21, 2021, 12:07pm

This is because you are fitting a density to non-numeric data (var_2). I'm not sure quite what you want, but I'd have thought some variation on geom_histogram again, with var_2 as the x variable.

system · August 11, 2021, 12:08pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.