collapse columns if not detected on previous one

Hi,
I'm working with phones and I need to collapse them.
Some times the phone numbers are repeated, so, there is no sense in aggregating the same number.
I don't know how to process that, because I need to detect at leat the last 6 number in order to avoid a phone...

phone1=c("56995760122","60121645","56526235","46546165161")
phone2=c("95760122","8548652","3464565456","6546165161")
phone3=c("56995760122","8548652","539762455","6165161")
desired=c("56995760122","60121645 / 8548652","539762455 / 56526235 /539762455","46546165161")
data=cbind(phone1,phone2,phone3,desired)

I know it's complex, but I'm thinking in writing a loop...
Thanks for your time and interest, folks

I think there is a typo in your desired, entry 3, first phone number. If that's the case, let's proceed by increasing complexity. First, let's look at the case where we compare whole phone numbers:

collapse_phone_numbers <- function(list_of_numbers){
  unique_numbers <- unique(list_of_numbers)
  numbers_as_char <- paste(unique_numbers, collapse = " / ")
}

apply(numbers, 1, collapse_phone_numbers)
# [1] "56995760122 / 95760122"             "60121645 / 8548652"                
# [3] "56526235 / 3464565456 / 539762455"  "46546165161 / 6546165161 / 6165161"

Here, putting everything in a personal function is a good way to only call it once with apply(,1,), and gives us modularity to improve it. So, second case, what if we only want to check the last 6 digits (and only return these digits)?

collapse_phone_numbers2 <- function(x){
  unique_numbers <- unique(substr(x, nchar(x)-5,nchar(x)))
  numbers_as_char <- paste(unique_numbers, collapse = " / ")
}

apply(numbers, 1, collapse_phone_numbers2)
# [1] "760122"                   "121645 / 548652"
# [3] "526235 / 565456 / 762455"     "165161" 

We get the right selection of numbers, but you may want to keep the full number, not just the last 6 digits.

collapse_phone_numbers3 <- function(xx){
  unique_numbers <- xx[!duplicated(substr(xx, nchar(xx)-5, nchar(xx)))]
  numbers_as_char <- paste(unique_numbers, collapse = " / ")
}
apply(numbers, 1, collapse_phone_numbers3)
# [1] "56995760122"                       "60121645 / 8548652"               
# [3] "56526235 / 3464565456 / 539762455" "46546165161" 

That's pretty much what you want. But there is a problem that doesn't appear with your example: here I'm only returning the first phone number whose last 6 digits are not duplicates. But what if the longest example is not the first one? That can be solved easily by ordering the phone numbers by number of digits beforehand:

collapse_phone_numbers4 <- function(xx){
  xx <- xx[order(nchar(xx), decreasing = TRUE)]
  unique_numbers <- xx[!duplicated(substr(xx, nchar(xx)-5, nchar(xx)))]
  numbers_as_char <- paste(unique_numbers, collapse = " / ")
}
apply(numbers, 1, collapse_phone_numbers4)
# [1] "56995760122"                       "60121645 / 8548652"               
# [3] "3464565456 / 539762455 / 56526235" "46546165161"    

The downside is that you loose the original order between phone1, phone2 etc, but I don't expect it matters in your case.

1 Like

I tried it, but I don't know how to use it well. I apply the function over a data frame.
Each row is a list of phone to collapse (only uniques) but I see that in some cases It doesn't do it.
I blame my little knowledge using functions...
Thanks, AlexisW.

can you clarify how "it doesn't do it"?

I run the code over my data frame...

apply(mydataframe[,2:20], 1, collapse_phone_numbers)
 [793] "972920674 / 941836086 / 56941836086 / NA"

As you can see, the second number is contained in the third one. Or, the third number is clearly not unique...
I keep trying with a loop...it is way too slow, and rare...

Did you define collapse_phone_numbers() with the code of collapse_phone_numbers4()? The 4th function is the one that should do what you want. It seems to work for me:

phone1=c("56995760122","60121645",  "56526235","46546165161",  "972920674")
phone2=c("95760122",   "8548652", "3464565456", "6546165161",  "941836086")
phone3=c("56995760122","8548652",  "539762455",    "6165161","56941836086")

numbers <- data.frame(phone1, phone2, phone3)
apply(numbers, 1, collapse_phone_numbers4)
# [1] "56995760122"                       "60121645 / 8548652"               
# [3] "3464565456 / 539762455 / 56526235" "46546165161"                      
# [5] "56941836086 / 972920674"  

AlexisW, you were right.
It worked. You have no idea how I failed writing a loop.
I tried to solve that task using the tidyverse, but I couldn't.
Where can I learn to write functions as the one you wrote?
I need to repeat, thanks a lot, AlexisW.
Thanks!

Where can I learn to write functions like this one?
Any book you recommend?

Not sure there is a single book, I'd say experience, knowing about the functions that exist and how to use them. The book R for data Science helped me a lot, but then it was answering questions here and on Stack Overflow, and solving problems with my own code, where I really learned most.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.