# compare two columns each

Hi!
I have a dataset which consists of the columns A1, A2, A3 and B1, B2, B3. I want to compare the values between the columns A1 & B2, A2 & B2, A3 & B3. Usually this would be easy with `mutate` from dplyr, but as I have many columns to compare in my real dataset, I would like to avoid having to do the comparison manually. That's why I build pairs which should be compared:

``````library(tidyverse)

df_A = data.frame(id = c("x1", "x2", "x3"), A1 = c(1,NA,3), A2 = c(2,3,4), A3 = c(3,4,5))
df_B = data.frame(id = c("x1", "x2", "x3"), B1 = c(1,2,3), B2 = c(NA,3,5), B3 = c(3,4,5))

df = df_A %>% left_join(df_B)

ndf  = names(df)
An = ndf[startsWith(ndf,"A")]
Bn = ndf[startsWith(ndf,"B")]

shared_nums = intersect(parse_number(An), parse_number(Bn))

pairs_to_do = map(shared_nums, ~c(paste0("A",.x), paste0("B",.x)))
``````

For the result of the comparison, I would like to indicate the type of change by using the following variables:

• NA ... if there is no value in the column A and no value in the column B
• 0 ... value A == value B
• 1 ... value A was deleted in B (value A != NA, value B == NA)
• 2 ... value A != value B, but neither value A nor value B == NA
• 3 ... value was added in B (value A == NA, value B != NA)
``````do_pair <- function(x,y,name){
stopifnot(length(x)==length(y))
r <- case_when(is.na(x) & is.na(y) ~ NA,
!is.na(x) & !is.na(y) & x == y ~ 0, # same value
!is.na(x) & is.na(y) ~ 1, # deleted value in B
!is.na(x) & !is.na(y) & x != y ~ 2, # different values
is.na(x) & !is.na(y) ~ 3) # added value in B
tibble(!!sym(name):=r)
}
``````

I'm not sure, how to do the next step the best way. Everything I tried, gave me an error or didn't work. That's how, the output should look like, when the colums C are the mutated columns: I hope, I was clear enough explaining my problem.

Hi, I think this does what you need:

``````library(tidyverse)

df_A <- data.frame(id = c("x1", "x2", "x3"), A1 = c(1,NA,3), A2 = c(2,3,4), A3 = c(3,4,5))
df_B <- data.frame(id = c("x1", "x2", "x3"), B1 = c(1,2,3), B2 = c(NA,3,5), B3 = c(3,4,5))

df <- df_A %>% left_join(df_B, by = "id")

do_pair <- function(x, y) {
case_when(
is.na(x) & is.na(y) ~ NA_real_,
!is.na(x) & !is.na(y) & x == y ~ 0, # same value
!is.na(x) & is.na(y) ~ 1, # deleted value in B
!is.na(x) & !is.na(y) & x != y ~ 2, # different values
is.na(x) & !is.na(y) ~ 3 # added value in B
)
}

df %>%
pivot_longer(-id) %>%
separate(name, into = c("letter", "number"), sep = 1) %>%
pivot_wider(c(id, number), names_from = letter, values_from = value) %>%
mutate(C = do_pair(A, B)) %>%
pivot_wider(id, names_from = number, values_from = c(A, B, C), names_sep = "")
``````

Yes, that's what I was looking for. Thank you very much for your help!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.