identifying bouts or periods in time series data

Background: I have a data set of locations (with timestamp) from GPS collared bears which go in and out of a protected area. I can assign each location as being inside or outside the area already, but I'm looking for (ideally) a tidyverse solution to assign a unique ID to each "bout" of being outside the protected area, or to reshape the data such that each row is a bout, so that I can can generate summary statistics such as average duration outside park, etc.
Here is a reproducible example of the data (though it would have additional columns) I'm dealing with. Notice that the timestamp is duplicated across different bears. The first example shows a desired outcome with the data in "long" format, including a column "desired" with an identifier of bout. The second example shows roughly how the data might look in wide format, with the start and end times being the min() and max() times for each bout. I'd be happy to get either.
I'm new to tidyverse but I have to believe it offers a better way than doing this all manually in excel...

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

bears<-c("A","A","A","A","B","B","B","B")
stamp<-c("2010.09.30 08:00:00", "2010.09.30 10:00:00", "2010.09.30 12:00:00", "2010.09.30 14:00:00",
         "2010.09.30 08:00:00", "2010.09.30 10:00:00", "2010.09.30 12:00:00", "2010.09.30 14:00:00")
state<-c("in","out","out","in","in","out","out","in")
desired<-c("1","2","2","3","4","5","5","6")
df<-as.data.frame(cbind(bears,stamp,state,desired))
df$stamp<-ymd_hms(df$stamp)
df
#>   bears               stamp state desired
#> 1     A 2010-09-30 08:00:00    in       1
#> 2     A 2010-09-30 10:00:00   out       2
#> 3     A 2010-09-30 12:00:00   out       2
#> 4     A 2010-09-30 14:00:00    in       3
#> 5     B 2010-09-30 08:00:00    in       4
#> 6     B 2010-09-30 10:00:00   out       5
#> 7     B 2010-09-30 12:00:00   out       5
#> 8     B 2010-09-30 14:00:00    in       6

bears2<-c("A","A","A")
stamp2<-c("start1","start2","start3")
stamp3<-c("end1","end2","end3")
state1<-c("in","out","in")
dfwide<-as.data.frame(cbind(bears2,stamp2,stamp3,state1))
dfwide
#>   bears2 stamp2 stamp3 state1
#> 1      A start1   end1     in
#> 2      A start2   end2    out
#> 3      A start3   end3     in

This does not do exactly what you want. It gives an id number to each period for each bear. You could construct a unique id by combining the bears and id columns, if necessary. It is also rather inelegant. More commands could be chained together without producing intermediate data frames but I think this form makes it easier to work through how the code works. I needed that.

library(lubridate)
#> 
library(tidyr)
library(purrr)
library(dplyr)
bears<-c("A","A","A","A","B","B","B","B")
stamp<-c("2010.09.30 08:00:00", "2010.09.30 10:00:00", "2010.09.30 12:00:00", "2010.09.30 14:00:00",
         "2010.09.30 08:00:00", "2010.09.30 10:00:00", "2010.09.30 12:00:00", "2010.09.30 14:00:00")
state<-c("in","out","out","in","in","out","out","in")
#desired<-c("1","2","2","3","4","5","5","6")
df<-as.data.frame(cbind(bears,stamp,state))
df$stamp<-ymd_hms(df$stamp)
df
#>   bears               stamp state
#> 1     A 2010-09-30 08:00:00    in
#> 2     A 2010-09-30 10:00:00   out
#> 3     A 2010-09-30 12:00:00   out
#> 4     A 2010-09-30 14:00:00    in
#> 5     B 2010-09-30 08:00:00    in
#> 6     B 2010-09-30 10:00:00   out
#> 7     B 2010-09-30 12:00:00   out
#> 8     B 2010-09-30 14:00:00    in
dfnest <- df %>% group_by(bears) %>% nest()
dfnest
#> # A tibble: 2 x 2
#>   bears data            
#>   <fct> <list>          
#> 1 A     <tibble [4 × 2]>
#> 2 B     <tibble [4 × 2]>
idFunc <- function(DF) {
  x <- DF$state
  id <- vector("integer", length = length(x))
  id[1] <- 1
  for (i in 2:length(x)) {
    if(x[i] == x[i-1]) id[i] <- id[i-1] else id[i] <- id[i-1] + 1
  }
  id
}

dfProc <- dfnest %>% mutate(id = map(data, idFunc))
dfDone <- unnest(dfProc)
dfDone
#> # A tibble: 8 x 4
#>   bears    id stamp               state
#>   <fct> <dbl> <dttm>              <fct>
#> 1 A         1 2010-09-30 08:00:00 in   
#> 2 A         2 2010-09-30 10:00:00 out  
#> 3 A         2 2010-09-30 12:00:00 out  
#> 4 A         3 2010-09-30 14:00:00 in   
#> 5 B         1 2010-09-30 08:00:00 in   
#> 6 B         2 2010-09-30 10:00:00 out  
#> 7 B         2 2010-09-30 12:00:00 out  
#> 8 B         3 2010-09-30 14:00:00 in

Created on 2019-05-02 by the reprex package (v0.2.1)

Do you really need the output of the bouts as it appear on your data frame?. If you can 'number' the bouts independently for each bear, i suggest you this non-tidy solution:

df$s2 <- ifelse(df$state == 'in', 1,0)
df <- split(df, df$bears)
for(i in 1:length(df)){
    df[[i]]$bout <- cumsum(c(1,abs(diff(df[[i]]$s2))))
}
df <- do.call(rbind, df)

> df
    bears               stamp state s2 bout
A.1     A 2010-09-30 08:00:00    in  1    1
A.2     A 2010-09-30 10:00:00   out  0    2
A.3     A 2010-09-30 12:00:00   out  0    2
A.4     A 2010-09-30 14:00:00    in  1    3
B.5     B 2010-09-30 08:00:00    in  1    1
B.6     B 2010-09-30 10:00:00   out  0    2
B.7     B 2010-09-30 12:00:00   out  0    2
B.8     B 2010-09-30 14:00:00    in  1    3
> 

Thanks to both of you! I was able to number the bouts for each bear and then combine this with the ID for a unique bout signifier. I selected Fer's response as the solution just because it's a few less lines, but they both worked for me.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.