# Generating time intervals of a data set based on shortest time interval and retaining corresponding values

Hi all!

I am quite new to R, so I would like to ask help with what kind of approach I should be taking.

I have time series data of gaze behavior that I would like to analyze which is structured as such:

Participant - Start - End - Duration - Gazed_Entity

The issue is that for each participant, I have unique time intervals. Although the time intervals are different for each gaze duration, it is occuring at the same time as other participants. This looks like as the following:

|Participant_Code|Start|End|Duration|Gazed_entity|
|Pink |00:00:00,000|00:00:50,368|00:00:50,368|Laptop|
|Pink |00:00:50,368|00:00:51,316|00:00:00,948|Yellow|
|Pink |00:00:51,316|00:01:12,287|00:00:20,971|Laptop|
|Pink |00:01:12,287|00:01:12,874|00:00:00,587|Other|
|Green|00:00:00,000|00:00:14,222|0:00:14,222|Laptop|
|Green|00:00:14,222|00:00:15,023|0:00:00,801|Pink|
|Green|00:00:15,023|00:01:16,201|0:01:01,178|Laptop|
|Green|00:01:16,201|00:01:16,869|0:00:00,668|Yellow|

For the analysis that I will be doing (which is the crqa package in R), I need to have equal length time intervals for each participant. How can I do this while also retaining "Gazed_Entity" that corresponds to that time interval?

The results should look something like this (which I am manually doing):

Shortest duration in this example: |00:00:00,587| so,
Pink - start: |00:00:00,000| end: |00:00:00,587| Gazed_entity: Laptop
Pink - start: |00:00:00,587| end: |00:00:01,174| Gazed_entity: Laptop
Pink - start: |00:00:01,174| end: |00:00:01,761| Gazed_entity: Laptop

I am not specifically asking for the formula, I have to figure it out based on the data I have; however, any suggestions towards the direction I should take in terms of functions and methodology would be appreciated!

Thanks all!

You will have to have a look at the `lubridate` package, which can deal with times and also fractional seconds. However, since you are dealing with such precise measurements, you should take the time and read a bit on stack overflow about the (numerical) precision of time storage in `R`, to get an idea about potential conflicts you will face:

R xts: .001 millisecond in index - Stack Overflow

R lubridate ymd_hms millisecond diff - Stack Overflow

As for the general procedure (not covering the fractional seconds problem), you can do something like this, which only uses `data.table` and `collapse`, since those are pretty fast and `data.table`s `ITime` class works well with the fast statistical functions inside `collapse` and can also be used to perform arithmetic operations (like division), which cannot be done with the base `R` `POSIXt` class:

``````## Read in the data
|Pink |00:00:01|00:00:50|00:00:49|Laptop|
|Pink |00:00:50|00:00:51|00:00:01|Yellow|
|Pink |00:00:51|00:01:12|00:00:21|Laptop|
|Pink |00:01:12|00:01:13|00:00:01|Other|
|Green|00:00:00|00:00:14|0:00:14|Laptop|
|Green|00:00:14|00:00:15|0:00:01|Pink|
|Green|00:00:15|00:01:16|0:01:01|Laptop|
|Green|00:01:16|00:01:17|0:00:01|Yellow|',
sep = '|',header = TRUE) |>
## only keep columns 2 to 6, because the others are NA
collapse::fselect(-c(1,7)) |>
## convert the times to ITime format from data.table
## (which can be easily used within collapse)
collapse::ftransformv(vars = Start:Duration, FUN = data.table::as.ITime)

## Find the smallest duration
Data  |>
(\(x) collapse::fmin(x\$Duration))() -> min_dur

Data <- Data |>
## Now create a weight, to expand the data corresponding to the smallest duration
collapse::fmutate(weight = as.integer(Duration / min_dur) + 1) |>
## Expand the Data with `tidyr::uncount()`
tidyr::uncount(weights = weight) |>
## Recreate the Start and End according to the Duration
# First, split into corresponding groups
collapse::rsplit(~ list(Participant_Code,Gazed_entity)) |>
# Second, apply a function which takes the start and end as well as the weight as arguments
collapse::rapply2d(FUN = \(x){
start <- as.POSIXct(collapse::fmin(x\$Start))
end   <- as.POSIXct(collapse::fmax(x\$End))
# There will be the current date added, since POSIXct is a date time format
x\$sequence_start <- seq.POSIXt(start, end, by = 1) |>
# remove the date
data.table::as.ITime()
x\$sequence_end <- data.table::shift(x\$sequence_start, type = 'lead')
# remove the last row, since there will be sequence_end equal to NA (due to the lead shift)
x[1:nrow(x) - 1,]
}) |>
## Recreate the data.frame
collapse::unlist2d(idcols = c("Participant_Code","Gazed_entity"))

#>   Participant_Code Gazed_entity    Start      End Duration sequence_start
#> 1            Green       Laptop 00:00:00 00:00:14 00:00:14       00:00:00
#> 2            Green       Laptop 00:00:00 00:00:14 00:00:14       00:00:01
#> 3            Green       Laptop 00:00:00 00:00:14 00:00:14       00:00:02
#> 4            Green       Laptop 00:00:00 00:00:14 00:00:14       00:00:03
#> 5            Green       Laptop 00:00:00 00:00:14 00:00:14       00:00:04
#> 6            Green       Laptop 00:00:00 00:00:14 00:00:14       00:00:05
#>   sequence_end
#> 1     00:00:01
#> 2     00:00:02
#> 3     00:00:03
#> 4     00:00:04
#> 5     00:00:05
#> 6     00:00:06
``````

Created on 2022-11-28 with reprex v2.0.2

I hope you have got a general understanding of the procedure with this.
Hopefully there is somebody else able to cover the milliseconds problem you have, since I don't know for the moment and I don't have the time to dig into it.

Kind regards

