How to Summarize Data.Frame Rows with Self Join in R


here is my Dataset

customer_id partner_id category location
AAP AAS 3 Melbourne
AAP AAS 1 Silicon Valley
AAP AAS 1 Sydney
AAP AAS 2 Sydney
AAP AAD 1 Sydney
AAP AAD 3 Sydney
AMS ADP 3 New York
AMS ADP 1 New York
AMS AXP 2 Dallas
AMS AXP 1 Dallas

Combination = Customer+Partner+location

i would like to summarize based on
if combination & Category == 1
replace category 1 with the biggest number
if the big Category number in combination is 3 replace category with 3 for all same Combination contain 1
if the big number is 2 replace with 2
else case let it be 1 as is.

how to do Self Join in R.?
you can execute the below R Snippet to get the data in R Console

customer_id<-c("AAP", "AAP", "AAP", "AAP", "AAP", "AAP", "AMS", "AMS", "AMS", "AMS")
partner_id<-c("AAS", "AAS", "AAS", "AAS", "AAD", "AAD", "ADP", "ADP", "AXP", "AXP")
location<-c("Melbourne", "Silicon Valley", "Sydney", "Sydney", "Sydney", "Sydney", "New York", "New York", "Dallas", "Dallas")

dataset<- data.frame(customer_id, partner_id, category,location)


Please post a minimal self-contained reproducible example (reprex). You can see a tutorial on how to put together a reprex here:


This explanation of the change you want to make is very hard to follow.

In any case you should include a input data in the form of a reprex, which you have done, and the output you expect also in the form of a reprex. That would give us something to compare the output of code we might suggest.

In any case this is my guess at what you are trying to do.


customer_id<-c("AAP", "AAP", "AAP", "AAP", "AAP", "AAP", "AMS", "AMS", "AMS", "AMS")
partner_id<-c("AAS", "AAS", "AAS", "AAS", "AAD", "AAD", "ADP", "ADP", "AXP", "AXP")
location<-c("Melbourne", "Silicon Valley", "Sydney", "Sydney", "Sydney", "Sydney", "New York", "New York", "Dallas", "Dallas")

tbl <- dataset<- data.frame(customer_id, partner_id, category,location)

tbl %>% group_by(customer_id, partner_id, location) %>%
    mutate(category = if_else(category == 1, max(.$category), category))
#> # A tibble: 10 x 4
#> # Groups:   customer_id, partner_id, location [6]
#>    customer_id partner_id category location      
#>    <fct>       <fct>         <dbl> <fct>         
#>  1 AAP         AAS              3. Melbourne     
#>  2 AAP         AAS              3. Silicon Valley
#>  3 AAP         AAS              3. Sydney        
#>  4 AAP         AAS              2. Sydney        
#>  5 AAP         AAD              3. Sydney        
#>  6 AAP         AAD              3. Sydney        
#>  7 AMS         ADP              3. New York      
#>  8 AMS         ADP              3. New York      
#>  9 AMS         AXP              2. Dallas        
#> 10 AMS         AXP              3. Dallas

Created on 2018-03-07 by the reprex package (v0.2.0).


hi @dan,
Thanks for your time.
the answer is very near to my expectation
and i made a small twik to satisfy my requirement.
i changed max(category) instead of max(.$category)
since i was looking for to replace the max of the same combination.

by the by is there any way to do this without using library(tidyverse) or any other package?


With merge and subsetting you can do join and modify operations on data.frames.


Could you plz provide your thoughts with Join and Modify approach.