Subtract one row from same row unless condition against another row is met

I'm not sure if I can succinctly describe what I want, so apologies in advance! I've included a desired outcome column below to try and illustrate what I want.

I want to find the relative difference between columns time_two and time_one but if the difference between time_two and time_one is less with a different time_one row, then use that to find the difference. So take unique_id 2, we can see for that row that the relative difference with it's row is 20 - 10 = 10, but the difference between 20 and 25 is -5, so I want to use that value instead. The same goes for unique_id 4 too.

df <-
  tibble::tribble(
    ~unique_id, ~group_id, ~time_one, ~time_two, ~rel_diff, ~desired_outcome,
             1,         1,        10,         5,        -5,               -5,
             2,         1,        10,        20,        10,               -5,
             3,         1,        10,        11,         1,                1,
             4,         1,        10,        30,        20,                5,
             5,         2,        25,        20,        -5,               -5,
             6,         2,        25,        25,         0,                0,
             7,         2,        25,        30,         5,                5
    )

my mind is boggled by the circumstances that would require calculating this... but I had fun doing it anyway

library(tidyverse)
df <-
  tibble::tribble(
    ~unique_id, ~group_id, ~time_one, ~time_two, ~rel_diff, ~desired_outcome,
    1,         1,        10,         5,        -5,               -5,
    2,         1,        10,        20,        10,               -5,
    3,         1,        10,        11,         1,                1,
    4,         1,        10,        30,        20,                5,
    5,         2,        25,        20,        -5,               -5,
    6,         2,        25,        25,         0,                0,
    7,         2,        25,        30,         5,                5
  )

df2 <- mutate(rowwise(df),
  uid = unique_id,
  rel_diff_other_rows = list(crossing(
                                      uid, time_two,
                                      select(df, unique_id, time_one)
                                    ) %>%
                                      filter(uid != unique_id) %>% mutate(
                                        alt_rel_diffs = time_two - time_one
                                      )),
  min_alt_diff = min(rel_diff_other_rows$alt_rel_diffs),
  final_calc_outcome = if_else(abs(min_alt_diff) - abs(rel_diff) > 0,
                               rel_diff,
                               min_alt_diff)
)

my mind is boggled by the circumstances that would require calculating this...

Haha! I'll try and explain: it's biological data where I'm looking at the timing of one event relative to another. Here's an image of the raw data:

We have two channels (1 and 2) and I'm interested in understanding the timing of channel 1 relative to 2. It makes more sense to look at the timing of channel 1 events (green arrows) relative to it's closest neighbour in channel 2 (black arrows). Unfortunately, the data as it's currently represented in my dataframe is such that the three green events are all around ~ +14 (the green events occur at ~ 15, and the first black event is at +1). So the above example would currently look something like this:

unique_id chan_2_id chan_2_timing chan_1_timing rel_timing
1 1 1 14.5 13.5
2 1 1 14.6 13.6
3 1 1 15 14
4 2 15 NA NA
5 3 17 NA NA

From the way I'm currently thinking about the data, they should actually be encoded as ~ -1. I hope that helps explain the crazy request :slight_smile:

And finally, thank you for the code. It seems to work a treat! I'll mark it as a solution shortly.

1 Like

great, thanks for the explain , appreciated.
There almost certainly is a faster solution if performance is an issue. probably an expert in data.table would do it easily. but I think my solution is easy to understand, and is probably alright for you if you have on the order of a few thousand events I would guess...

Yes, I found it easy to see what was happening, so thank you! I actually have a few hundred thousand rows so I did have to make some modifications to make it more manageable. I split my data by another grouping variable, loop through the data and write the output to csvs. I also restricted the crossing to only adjacent channel 2 events since that's what I'm really interested in... All in all, it only takes 15 or so mins on my crappy laptop to run through so no complaints here!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.