Aggregated ID group based on different levels of IDs

Hi guys,

I am trying to create an aggregated ID group, based on different levels of IDs.
Where if the ID matches in one level will be part of the parent level (Expected Group)

Here an example,
image

the problem is that my database contains 1 million rows. I did a loop regrouping every single match but it is consuming an uncountable amount of time.

There is any function or library which could help in this task?

Thank you in advance

Please explain more about the rule for determining the Expected Group. In your example, some of the rows share no values in Level1 - Level4 but they have the same Expected Group.

The expected group should be generated by the condition of: if one element of the List_Level exists in other list then correspond to the same group.

But should be executed in a way that avoids loops

Data <- data.frame(
  stringsAsFactors = FALSE,
            Level1 = c("1_A","1_A","1_A","1_B",
                       "1_B","1_B","1_C","1_D","1_E"),
            Level2 = c("2_A","2_B","2_B","2_B",
                       "2_B","2_C","2_D","2_D","2_E"),
            Level3 = c("3_A","3_A","3_B","3_B",
                       "3_C","3_C","3_D","3_D","3_E"),
            Level4 = c("4_A","4_B","4_C","4_C",
                       "4_C","4_C","4_C","4_D","4_D")
)

Data$RowNumber <- c(1:nrow(Data))


L1 <- Data%>%
        dplyr::group_by(Level1)%>%
        dplyr::summarise(LV1_List = toString(RowNumber))


L2 <- Data%>%
        dplyr::group_by(Level2)%>%
        dplyr::summarise(LV2_List = toString(RowNumber))

L2 <- Data%>%
        dplyr::group_by(Level2)%>%
        dplyr::summarise(LV2_List = toString(RowNumber))

L3 <- Data%>%
        dplyr::group_by(Level3)%>%
        dplyr::summarise(LV3_List = toString(RowNumber))

L4 <- Data%>%
        dplyr::group_by(Level4)%>%
        dplyr::summarise(LV4_List = toString(RowNumber))

Data <- left_join(Data, L1, by = "Level1")
Data <- left_join(Data, L2, by = "Level2")
Data <- left_join(Data, L3, by = "Level3")
Data <- left_join(Data, L4, by = "Level4")

Data$List_Level <- gsub(" ", "", trimws(paste(Data$LV1_List,Data$LV2_List,Data$LV3_List,Data$LV4_List, sep = "," )))

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.