Check additions/updates between dataframes

ply · March 24, 2022, 10:22am

two dataframes are identical but the 2nd one could have updated data and also new records

I want to display which rows are new and which rows have been updated ?

example 2 dataframes. the 2nd one has a new rows added and also one of the column values changed for another row

a1 <- structure(list(
key = c("1", "2", "3"),
town = c("Crewe", "Sandbach", "Middlewich"),
area = c("Cheshire","Cheshire", "Cheshire"),
total_pop = c(100, 400, 120)),
row.names = c(NA, -3L),
class = "data.frame")

a2 <- structure(list(
key = c("1", "2", "3","4"),
town = c("Crewe", "Sandbach", "Middlewich","Nantwich"),
area = c("Cheshire","Cheshire", "Cheshire","Cheshire"),
total_pop = c(100, 400, 100,200)),
row.names = c(NA, -4L),
class = "data.frame")

cheers

nirgrahamuk · March 24, 2022, 10:33am

If I wanted to see differences, I usually reach for waldo

 waldo::compare(a1,a2)

Unfortunately this forum doesn't colourise the way waldo does, the colourisation highlights the differences which you wont see in the below text

`attr(old, 'row.names')`: 1 2 3  
`attr(new, 'row.names')`: 1 2 3 4

`old$key`: "1" "2" "3"    
`new$key`: "1" "2" "3" "4"

`old$town`: "Crewe" "Sandbach" "Middlewich"           
`new$town`: "Crewe" "Sandbach" "Middlewich" "Nantwich"

`old$area`: "Cheshire" "Cheshire" "Cheshire"           
`new$area`: "Cheshire" "Cheshire" "Cheshire" "Cheshire"

`old$total_pop`: 100 400 120    
`new$total_pop`: 100 400 100 200

ply · March 24, 2022, 11:45am

Thanks for that..nice but the formatting has a lot to be desired

Is there a way of exporting these differences rather than trying to work them out from the console ?

nirgrahamuk · March 24, 2022, 12:21pm

library(tidyverse)
dplyr::setdiff(a2,a1) %>% mutate(key_in_first = key %in% pull(a1,key))

in this case you can show the differences by row of the 2nd as compared to 1st, and distinguish additions from updates by reference to the key

system · April 14, 2022, 12:21pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.