Hello, hello,
while working on a personal project of mine, I confronted the following problem: I have a list_of_dataframes containing multiple dataframes, which have columns with the same names and same classes, except for one column (called m in my example below). Because of I/O issues (Excel blues...), m has different classes in different dataframes. My goal is to merge list_of_dataframes to a single dataframe using bind_rows(). To do that, I need to convert the m column of each dataframe to the same class (integer), otherwise bind_rows(list_of_dataframes) fails.
Question: which is the simplest/more readable way to convert the same variable in multiple dataframes to the same class? I show my solution below using dplyr and purrr: it's not bad, but I wonder if there's a simpler way. Note: the solution should work with a list_of_dataframes of arbitrary length, and of course with an arbitrary column name.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
# generate sample data
n <- 100
df1 <- data.frame(x = runif(n), y = rnorm(n), m = seq_len(n))
df2 <- data.frame(x = runif(n), y = rnorm(n), m = as.character(seq_len(n)))
df3 <- data.frame(x = runif(n), y = rnorm(n), m = as.factor(seq_len(n)))
list_of_dataframes <- list(df1 = df1, df2 = df2, df3 = df3)
# directly merging the databases would fail, because column m has a different class in
# different dataframes
# dataframes <- bind_rows(list_of_dataframes) # NOT RUN
# Thus, we need to convert all columns named m to the same class
convert_all_columns_to_integer <- function(list_of_dataframes, varname){
varnames <- rep_len(varname, length(list_of_dataframes))
convert_to_integer <- function(dataframe, variable){
dataframe[[variable]] <- as.integer(dataframe[[variable]])
return(dataframe)
}
list_of_dataframes <- map2(list_of_dataframes, varnames, convert_to_integer)
return(list_of_dataframes)
}
list_of_dataframes <- convert_all_columns_to_integer(list_of_dataframes, "m")
# now binding will be successful
dataframes <- bind_rows(list_of_dataframes)
Created on 2018-08-23 by the reprex package (v0.2.0).