I want to unite two columns but only when there is a duplicate value in one of the columns. A simple example of the current data:
library(tidyverse)
tibble(x1 = c("joe", "ann", "ann", "sam", "sam", "ken"),
x2 = c("boston", "new york", "hartford", "montreal", "boise", "tulsa"))
#> # A tibble: 6 x 2
#> x1 x2
#> <chr> <chr>
#> 1 joe boston
#> 2 ann new york
#> 3 ann hartford
#> 4 sam montreal
#> 5 sam boise
#> 6 ken tulsa
And what I want it to look like:
#> # A tibble: 6 x 2
#> x1 x2
#> <chr> <chr>
#> 1 joe boston
#> 2 ann - new york new york
#> 3 ann - hartford hartford
#> 4 sam - montreal montreal
#> 5 sam - boise boise
#> 6 ken tulsa
I have an mildly onerous workaround but I know there must be a more elegant way.
(This is my first post; sorry for any errors in etiquette )
I'm not sure how elegant this is either, but it seems to get the job done. The first duplicated() call identifies the second (or further) instance of the duplicated row and then I combine it with the fromLast = TRUE option to identify the first instance of the duplicate.
library(dplyr)
df <- tibble(
x1 = c("joe", "ann", "ann", "sam", "sam", "ken"),
x2 = c("boston", "new york", "hartford", "montreal", "boise", "tulsa")
)
df %>%
mutate(x1 = if_else(
duplicated(x1) | duplicated(x1, fromLast = TRUE),
paste0(x1, " - ", x2),
x1
))
#> # A tibble: 6 x 2
#> x1 x2
#> <chr> <chr>
#> 1 joe boston
#> 2 ann - new york new york
#> 3 ann - hartford hartford
#> 4 sam - montreal montreal
#> 5 sam - boise boise
#> 6 ken tulsa
Adding: and here's another way of doing it. Elegance not guaranteed!
df %>%
mutate(x1 = if_else(
x1 %in% x1[duplicated(x1)],
paste0(x1, " - ", x2),
x1
))
#> # A tibble: 6 x 2
#> x1 x2
#> <chr> <chr>
#> 1 joe boston
#> 2 ann - new york new york
#> 3 ann - hartford hartford
#> 4 sam - montreal montreal
#> 5 sam - boise boise
#> 6 ken tulsa
That bit is probably better written as unique(x1) (edit: actually, no), but that's otherwise the alteration I was thinking of suggesting to your original code. I personally think it's cleaner than what I had.