I think there is a typo in your desired
, entry 3, first phone number. If that's the case, let's proceed by increasing complexity. First, let's look at the case where we compare whole phone numbers:
collapse_phone_numbers <- function(list_of_numbers){
unique_numbers <- unique(list_of_numbers)
numbers_as_char <- paste(unique_numbers, collapse = " / ")
}
apply(numbers, 1, collapse_phone_numbers)
# [1] "56995760122 / 95760122" "60121645 / 8548652"
# [3] "56526235 / 3464565456 / 539762455" "46546165161 / 6546165161 / 6165161"
Here, putting everything in a personal function is a good way to only call it once with apply(,1,)
, and gives us modularity to improve it. So, second case, what if we only want to check the last 6 digits (and only return these digits)?
collapse_phone_numbers2 <- function(x){
unique_numbers <- unique(substr(x, nchar(x)-5,nchar(x)))
numbers_as_char <- paste(unique_numbers, collapse = " / ")
}
apply(numbers, 1, collapse_phone_numbers2)
# [1] "760122" "121645 / 548652"
# [3] "526235 / 565456 / 762455" "165161"
We get the right selection of numbers, but you may want to keep the full number, not just the last 6 digits.
collapse_phone_numbers3 <- function(xx){
unique_numbers <- xx[!duplicated(substr(xx, nchar(xx)-5, nchar(xx)))]
numbers_as_char <- paste(unique_numbers, collapse = " / ")
}
apply(numbers, 1, collapse_phone_numbers3)
# [1] "56995760122" "60121645 / 8548652"
# [3] "56526235 / 3464565456 / 539762455" "46546165161"
That's pretty much what you want. But there is a problem that doesn't appear with your example: here I'm only returning the first phone number whose last 6 digits are not duplicates. But what if the longest example is not the first one? That can be solved easily by ordering the phone numbers by number of digits beforehand:
collapse_phone_numbers4 <- function(xx){
xx <- xx[order(nchar(xx), decreasing = TRUE)]
unique_numbers <- xx[!duplicated(substr(xx, nchar(xx)-5, nchar(xx)))]
numbers_as_char <- paste(unique_numbers, collapse = " / ")
}
apply(numbers, 1, collapse_phone_numbers4)
# [1] "56995760122" "60121645 / 8548652"
# [3] "3464565456 / 539762455 / 56526235" "46546165161"
The downside is that you loose the original order between phone1
, phone2
etc, but I don't expect it matters in your case.