how to identify similar variables between data frames

Dear all,

I am recently on board of Studio coming from SPSS, and I ask for help.
I am working with two different data.frames (x and y), both of them contains around 300 columns/variables. I learn how to do a merging based on the same ID code from the subject ( merge (x, y, by = "ID", all= TRUE) but it merge both df with a total of 600 columns. Of course, this both df contains others similar variables, I was wondering if:

  1. Is there a way to identify the shared columns/variable between both df?
  2. How to merge all the same columns/variables of both df?

Thanks in advance and best regards!

Typically, when you merge data frames you're looking to bring together different columns.

In this case, you probably want to isolate the common column names and check if they have the same information. For those identical columns, identify them and only merge from x to prevent redundant information.

common <- setdiff(intersect(names(x), names(y)), "id")
iden <- sapply(common, function(mycol) identical(x$mycol, y$mycol))
cors <- sapply(common, function(mycol) identical(x[, mycol], y[, mycol]))

z <- merge(
  # merge of different columns
  merge(
    x[, setdiff(names(x), common[iden]], 
    y[, setdiff(names(y), common[iden]],
    by = "id"
  ),
  # merge of common columns
  x[, c("id", common[iden])],
  by = "id"
)

Thanks! I was able to create common, identify and cors but the code to merge does not seems to work:

Error: unexpected ')' in " by = "id")"

I made an edit that might have fixed the problem!

I'm just writing code out of thin air so it's hard to get it all right. If you need more help, I recommend creating a reproducible example. FAQ: What's a reproducible example (`reprex`) and how do I create one?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.