Help with concatenating row values if they have the same identity value.

Hi all,

I've been sitting with this problem for a few hours now and can't seem to come up with a good solution - I hope you can help me!

The data are structured like so:

Speaker StartTimeMS EndTimeMS
1 John 0 2009
2 Mother 4203 6316
3 John 6687 7221
4 Mother 6676 8092
5 Mother 9309 16623
6 John 17036 18220
7 Mother 17930 18940
8 John 19399 20026
9 Mother 20020 21192
10 Mother 22808 23458

What I want to do is combine the StartTimeMS from one row and the EndTimeMS from another row, if two of the same Speakers appear after each other; that is, the goal is to combine the time data in the 'Mother' cells appearing immediately after each other. The goal is the following:

Speaker StartTimeMS EndTimeMS
1 John 0 2009
2 Mother 4203 6316
3 John 6687 7221
4 Mother 6676 16623
5 John 17036 18220
7 Mother 17930 18940
8 John 19399 20026
9 Mother 20020 23458

I've been trying with the following code, but I can't seem to find the final step of combining the data from other columns:

data <- data %>%
mutate(identity = lag(Speaker, 1),
decide = if_else(Speaker != identity, 1, 0),
keep = lead(decide, 1, default = 1)) %>%
filter(keep == 1) %>%
select(-identity, -decide, -keep)

I'd really appreciate your help on this!!

Here is some code for a subset of the data:
Speaker <- c("John", "Mother", "John", "Mother", "Mother", "John", "Mother", "John", "Mother", "Mother")
StartTimeMS <- c(0, 4203, 6687, 6676, 9309, 17036, 17930, 19399, 20020, 22808)
EndTimeMS <- c(2009, 6316, 7221, 8092, 16623, 18220, 18940, 20026, 21192, 23458)
data <- data.frame(Speaker, StartTimeMS,EndTimeMS)

Hi,

Is this what you're looking for? Rows 4+5 and 9+10 have been combined.

library(tidyverse)

data <- tibble(speaker = c("John", "Mother", "John", "Mother", "Mother", "John", "Mother", "John", "Mother", "Mother"),
               start_time = c(0, 4203, 6687, 6676, 9309, 17036, 17930, 19399, 20020, 22808),
               end_time = c(2009, 6316, 7221, 8092, 16623, 18220, 18940, 20026, 21192, 23458),
               )

data
#> # A tibble: 10 x 3
#>    speaker start_time end_time
#>    <chr>        <dbl>    <dbl>
#>  1 John             0     2009
#>  2 Mother        4203     6316
#>  3 John          6687     7221
#>  4 Mother        6676     8092
#>  5 Mother        9309    16623
#>  6 John         17036    18220
#>  7 Mother       17930    18940
#>  8 John         19399    20026
#>  9 Mother       20020    21192
#> 10 Mother       22808    23458

data %>%
  ## add temporary variables
  mutate(identity = lag(speaker),
         keep_start = case_when(speaker != identity ~ 1, 
                                is.na(identity) ~ 1,
                                TRUE ~ 0),
         keep_end = lead(keep_start, 1, default = 1)) %>%
  ## remove start or end time when consecutive talks are done by same speaker 
  mutate(start_time = ifelse(keep_start == 1, start_time, NA),
         end_time = ifelse(keep_end == 1, end_time, NA)
         ) %>% 
  ## copy start_time to next talk
  fill(start_time) %>% 
  ## remove previous talk
  filter(!is.na(end_time)) %>%
  ## remove the temporary variables
  select(-(identity:keep_end))
#> # A tibble: 8 x 3
#>   speaker start_time end_time
#>   <chr>        <dbl>    <dbl>
#> 1 John             0     2009
#> 2 Mother        4203     6316
#> 3 John          6687     7221
#> 4 Mother        6676    16623
#> 5 John         17036    18220
#> 6 Mother       17930    18940
#> 7 John         19399    20026
#> 8 Mother       20020    23458

Created on 2021-02-11 by the reprex package (v1.0.0)

I have not tested this for cases when a speaker has 3 consecutive talks, but I gladly leave that up to you :slight_smile:

Hi Lars,

Thanks a million for this - this works on the above subset of data!

In the larger dataset, however, I have up to 60 successive 'Speakers' occurring after each other. I'll try to adapt the code to work for these cases too!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.