Check additions/updates between dataframes

two dataframes are identical but the 2nd one could have updated data and also new records

I want to display which rows are new and which rows have been updated ?

example 2 dataframes. the 2nd one has a new rows added and also one of the column values changed for another row

a1 <- structure(list(
key = c("1", "2", "3"),
town = c("Crewe", "Sandbach", "Middlewich"),
area = c("Cheshire","Cheshire", "Cheshire"),
total_pop = c(100, 400, 120)),
row.names = c(NA, -3L),
class = "data.frame")

a2 <- structure(list(
key = c("1", "2", "3","4"),
town = c("Crewe", "Sandbach", "Middlewich","Nantwich"),
area = c("Cheshire","Cheshire", "Cheshire","Cheshire"),
total_pop = c(100, 400, 100,200)),
row.names = c(NA, -4L),
class = "data.frame")

cheers

If I wanted to see differences, I usually reach for waldo

 waldo::compare(a1,a2)

Unfortunately this forum doesn't colourise the way waldo does, the colourisation highlights the differences which you wont see in the below text

`attr(old, 'row.names')`: 1 2 3  
`attr(new, 'row.names')`: 1 2 3 4

`old$key`: "1" "2" "3"    
`new$key`: "1" "2" "3" "4"

`old$town`: "Crewe" "Sandbach" "Middlewich"           
`new$town`: "Crewe" "Sandbach" "Middlewich" "Nantwich"

`old$area`: "Cheshire" "Cheshire" "Cheshire"           
`new$area`: "Cheshire" "Cheshire" "Cheshire" "Cheshire"

`old$total_pop`: 100 400 120    
`new$total_pop`: 100 400 100 200

Thanks for that..nice but the formatting has a lot to be desired

Is there a way of exporting these differences rather than trying to work them out from the console ?

library(tidyverse)
dplyr::setdiff(a2,a1) %>% mutate(key_in_first = key %in% pull(a1,key))

in this case you can show the differences by row of the 2nd as compared to 1st, and distinguish additions from updates by reference to the key

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.