Comparing Data Frames

jbeckerman · August 27, 2019, 8:45pm

Hi All,
I have an assortment of data frames showing different sets of the same data set. All of the data sets have the same column names. I was wondering if there was an easy way for me to compare different summary statistics for all of my different data sets.
Thanks

jbeckerman · August 27, 2019, 8:45pm

I would also at some point like to graph these comparisons.

FJCC · August 27, 2019, 9:20pm

You can make one big data frame with an additional id column showing which data frame the data came from, then calculate statistics grouping by the id column.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df1 <- data.frame(A = LETTERS[1:5], B = rnorm(5), stringsAsFactors = FALSE)
df2 <- data.frame(A = LETTERS[1:7], B = rnorm(7, 2,0.5), stringsAsFactors = FALSE)
AllDat <- list(df1 = df1, df2 = df2)
COMB <- bind_rows(AllDat, .id = "DF")
COMB
#>     DF A          B
#> 1  df1 A -0.1209323
#> 2  df1 B -2.6429294
#> 3  df1 C  2.3441191
#> 4  df1 D  0.0531904
#> 5  df1 E  0.8420014
#> 6  df2 A  1.7757831
#> 7  df2 B  1.6594860
#> 8  df2 C  1.3826828
#> 9  df2 D  1.8847440
#> 10 df2 E  2.6229499
#> 11 df2 F  1.8920032
#> 12 df2 G  1.2835507
STATS <- COMB %>% group_by(DF) %>%  summarize(Avg = mean(B))
STATS
#> # A tibble: 2 x 2
#>   DF       Avg
#>   <chr>  <dbl>
#> 1 df1   0.0951
#> 2 df2   1.79

^{Created on 2019-08-27 by the reprex package (v0.2.1)}

valeri · August 28, 2019, 10:03am

Just to add a bit to that. If you have many data frames (tibbles) and you don't want to list them manually you could do

# All objects
objects <- ls() 

# Find object classes
object_classes <- purrr::map(objects, function(x) class(get(x)))

# Find all data.frame objects (class of the object contains "data.frame"
data_frame_objects <- objects[purrr::map_lgl(object_classes, function(x) "data.frame" %in% x)]

# Put all data.frame's in a list
AllDat <- purrr::map(data_frame_objects, get)

# Set names of AllDat to be the names of the data.frames objects
names(AllDat) <- data_frame_objects

... continue with @FJCC's code

system · September 18, 2019, 10:03am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.