collapse consecutively repeated rows and group by other varibales

Dear R experts,

I would like to collapse rows if a condition repeats in the following rows. For example, the original data is as follows
dataset <- data.frame(id = c("a",'a','a','a', "b", "b",'b','b'),
id2 = c("1","2","2","2", "1","1","2",'1'),
text = c("gi", "hi",'ed','ew', "ic",'ia',"ws",'er'),
stringsAsFactors = FALSE)
for id=='a', variable id2 repeats consecutively for three times (2,2,2);
for id=='b', variable id2 repeats consecutively for two times(1,1);
I would like to merge the text variable if id2 repeated in the following rows and results in dataset2.

dataset2 <- data.frame(id = c("a",'a', "b", "b",'b'),
id2 = c("1", "2", "1","2",'1'),
text = c("gi", "hi ed ew", "ic ia","ws",'er'),
stringsAsFactors = FALSE)

Do you have suggestion to achieve it? Thank you very much.

Best wishes,
Veda

Hi!

To help us help you, could you please prepare a reproducible example (reprex)? You are almost there with what you have right now, so please have a look at this guide, to see how to create one:

Also, take a look at rle function. I think, it can help you in this case.

I think the following does what you want:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

input_dataset <- data.frame(id = c("a",'a','a','a', "b", "b",'b','b'),
                            id2 = c("1","2","2","2", "1","1","2",'1'),
                            text = c("gi", "hi",'ed','ew', "ic",'ia',"ws",'er'),
                            stringsAsFactors = FALSE)

expected_dataset <- data.frame(id = c("a",'a', "b", "b",'b'),
                               id2 = c("1", "2", "1","2",'1'),
                               text = c("gi", "hi ed ew", "ic ia","ws",'er'),
                               stringsAsFactors = FALSE)

output_dataset <- input_dataset %>%
  mutate(ID = paste0(id, id2)) %>%
  transform(ID2 = rep.int(x = seq(length(x = rle(x = ID)$values)),
                          times = rle(x = ID)$lengths)) %>%
  group_by(ID2) %>%
  mutate(text = paste0(text, collapse = " ")) %>%
  ungroup() %>%
  distinct() %>%
  select(-ID, -ID2) %>%
  as.data.frame()

all.equal(target = expected_dataset,
          current = output_dataset)
#> [1] TRUE

Created on 2019-03-25 by the reprex package (v0.2.1)

1 Like

It does wonders. Thank you so much.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.