Comparing Data Frames

Hi All,
I have an assortment of data frames showing different sets of the same data set. All of the data sets have the same column names. I was wondering if there was an easy way for me to compare different summary statistics for all of my different data sets.
Thanks :slight_smile:

I would also at some point like to graph these comparisons.

You can make one big data frame with an additional id column showing which data frame the data came from, then calculate statistics grouping by the id column.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df1 <- data.frame(A = LETTERS[1:5], B = rnorm(5), stringsAsFactors = FALSE)
df2 <- data.frame(A = LETTERS[1:7], B = rnorm(7, 2,0.5), stringsAsFactors = FALSE)
AllDat <- list(df1 = df1, df2 = df2)
COMB <- bind_rows(AllDat, .id = "DF")
COMB
#>     DF A          B
#> 1  df1 A -0.1209323
#> 2  df1 B -2.6429294
#> 3  df1 C  2.3441191
#> 4  df1 D  0.0531904
#> 5  df1 E  0.8420014
#> 6  df2 A  1.7757831
#> 7  df2 B  1.6594860
#> 8  df2 C  1.3826828
#> 9  df2 D  1.8847440
#> 10 df2 E  2.6229499
#> 11 df2 F  1.8920032
#> 12 df2 G  1.2835507
STATS <- COMB %>% group_by(DF) %>%  summarize(Avg = mean(B))
STATS
#> # A tibble: 2 x 2
#>   DF       Avg
#>   <chr>  <dbl>
#> 1 df1   0.0951
#> 2 df2   1.79

Created on 2019-08-27 by the reprex package (v0.2.1)

1 Like

Just to add a bit to that. If you have many data frames (tibbles) and you don't want to list them manually you could do

# All objects
objects <- ls() 

# Find object classes
object_classes <- purrr::map(objects, function(x) class(get(x)))

# Find all data.frame objects (class of the object contains "data.frame"
data_frame_objects <- objects[purrr::map_lgl(object_classes, function(x) "data.frame" %in% x)]

# Put all data.frame's in a list
AllDat <- purrr::map(data_frame_objects, get)

# Set names of AllDat to be the names of the data.frames objects
names(AllDat) <- data_frame_objects

... continue with @FJCC's code

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.