Avoiding repeated variables during merging

Hi, every time I merge files containing the same variables, R creates their copies by adding .x, .y etc.
I have these 3 simple files, which I am merging:

File1 <- data.frame(
  stringsAsFactors = FALSE,
              Code = c(8190, 8057, 8038, 8173, 8013, 3453, 3453),
             Score = c(1, 2, 3, 4, 5, 6, 7),
             Brand = c("aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg"),
              Year = c(2023, 2023, 2022, 2023, 2022, 2021, 2023)
)

File2 <- data.frame(
  stringsAsFactors = FALSE,
  CodeNumber = c(8190, 8057, 8038, 8173, 8013),
  Gender = c("Male", "Female", "Female", "Male", "Male"),
  Brand = c("aaa", "bbb", "ccc", "ddd", "eee"),
  Year = c(2023, 2023, 2022, 2023, 2022)
)

File3 <- data.frame(
  stringsAsFactors = FALSE,
  MainCode = c(8038, 8173, 8013, 3453, 3453),
  Brand = c("ccc", "ddd", "eee", "fff", "ggg"),
  Year = c(2022, 2023, 2022, 2021, 2023),
  City = c("London", "Paris", "London", "London", "Warsaw"),
  Country = c("UK", "France", "UK", "UK", "Poland")
)

library(dplyr)
all.files <- File1 %>%
  left_join(File2, by = c("Code" = "CodeNumber"), multiple="all") %>%
  left_join(File3, by = c("Code" = "MainCode"), multiple="all")

all.files

Is there a way of removing repeated variables from merging? In this example I want to keep just the original Brand and Year from the main File1.
Is it possible?

remove them with select( -)

all.files <- File1 %>%
  left_join( select(File2,-Brand,-Year) , by = c("Code" = "CodeNumber"), multiple="all") %>%
  left_join(select(File3,-Brand,-Year), by = c("Code" = "MainCode"), multiple="all",
            relationship = "many-to-many")

Thank you but is there a clever way of finding them? I am merging files with many variables and I was wondering if there is a clever way of rejecting repeated variables.
I was thinking about deleting all variables ending with .x and .y but this solution is not working - in my case Brand.x Year.x are the most important...

I think the solutions would be:

  • Remove those columns from the File2 and File3.
  • Perform the left_join without specifying the by parameter if that's an option.
  • Specify the duplicate suffix of y and then remove at the end.
all.files <- File1 %>%
  left_join(File2, by = c("Code" = "CodeNumber"), multiple = "all", suffix = c("", ".y")) %>%
  left_join(File3, by = c("Code" = "MainCode"), multiple = "all", suffix = c("", ".y")) %>% 
  select(-contains(".y"))

Here I improvise my own version of left_join that does this behaviour.

File1 <- data.frame(
  stringsAsFactors = FALSE,
  Code = c(8190, 8057, 8038, 8173, 8013, 3453, 3453),
  Score = c(1, 2, 3, 4, 5, 6, 7),
  Brand = c("aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg"),
  Year = c(2023, 2023, 2022, 2023, 2022, 2021, 2023)
)

File2 <- data.frame(
  stringsAsFactors = FALSE,
  CodeNumber = c(8190, 8057, 8038, 8173, 8013),
  Gender = c("Male", "Female", "Female", "Male", "Male"),
  Brand = c("aaa", "bbb", "ccc", "ddd", "eee"),
  Year = c(2023, 2023, 2022, 2023, 2022)
)

File3 <- data.frame(
  stringsAsFactors = FALSE,
  MainCode = c(8038, 8173, 8013, 3453, 3453),
  Brand = c("ccc", "ddd", "eee", "fff", "ggg"),
  Year = c(2022, 2023, 2022, 2021, 2023),
  City = c("London", "Paris", "London", "London", "Warsaw"),
  Country = c("UK", "France", "UK", "UK", "Poland")
)

myleft_join <- function(x,y,by=NULL,copy=FALSE,suffix=c(".x",".y"),...,keep=NULL){
  common_names <- intersect(names(x),names(y))
  common_names <- setdiff( setdiff(common_names,unname(by)),names(by))
  y <- select(y,
              -all_of(common_names))
  left_join(x,
            y,
            by=by,
            copy=copy,
            suffix=suffix,
            keep=keep,
            ...)
}

library(dplyr)
all.files <- File1 %>%
  myleft_join(File2, by = c("Code" = "CodeNumber"), multiple="all") %>%
  myleft_join(File3, by = c("Code" = "MainCode"), multiple="all")

all.files
1 Like

Thank you. What if I wanted to merge also by Year?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.