Subtract one row from same row unless condition against another row is met

mrblobby · February 9, 2020, 5:04am

I'm not sure if I can succinctly describe what I want, so apologies in advance! I've included a desired outcome column below to try and illustrate what I want.

I want to find the relative difference between columns time_two and time_one but if the difference between time_two and time_one is less with a different time_one row, then use that to find the difference. So take unique_id 2, we can see for that row that the relative difference with it's row is 20 - 10 = 10, but the difference between 20 and 25 is -5, so I want to use that value instead. The same goes for unique_id 4 too.

df <-
  tibble::tribble(
    ~unique_id, ~group_id, ~time_one, ~time_two, ~rel_diff, ~desired_outcome,
             1,         1,        10,         5,        -5,               -5,
             2,         1,        10,        20,        10,               -5,
             3,         1,        10,        11,         1,                1,
             4,         1,        10,        30,        20,                5,
             5,         2,        25,        20,        -5,               -5,
             6,         2,        25,        25,         0,                0,
             7,         2,        25,        30,         5,                5
    )

nirgrahamuk · February 9, 2020, 11:56am

my mind is boggled by the circumstances that would require calculating this... but I had fun doing it anyway

library(tidyverse)
df <-
  tibble::tribble(
    ~unique_id, ~group_id, ~time_one, ~time_two, ~rel_diff, ~desired_outcome,
    1,         1,        10,         5,        -5,               -5,
    2,         1,        10,        20,        10,               -5,
    3,         1,        10,        11,         1,                1,
    4,         1,        10,        30,        20,                5,
    5,         2,        25,        20,        -5,               -5,
    6,         2,        25,        25,         0,                0,
    7,         2,        25,        30,         5,                5
  )

df2 <- mutate(rowwise(df),
  uid = unique_id,
  rel_diff_other_rows = list(crossing(
                                      uid, time_two,
                                      select(df, unique_id, time_one)
                                    ) %>%
                                      filter(uid != unique_id) %>% mutate(
                                        alt_rel_diffs = time_two - time_one
                                      )),
  min_alt_diff = min(rel_diff_other_rows$alt_rel_diffs),
  final_calc_outcome = if_else(abs(min_alt_diff) - abs(rel_diff) > 0,
                               rel_diff,
                               min_alt_diff)
)

mrblobby · February 10, 2020, 4:51pm

my mind is boggled by the circumstances that would require calculating this...

Haha! I'll try and explain: it's biological data where I'm looking at the timing of one event relative to another. Here's an image of the raw data:

We have two channels (1 and 2) and I'm interested in understanding the timing of channel 1 relative to 2. It makes more sense to look at the timing of channel 1 events (green arrows) relative to it's closest neighbour in channel 2 (black arrows). Unfortunately, the data as it's currently represented in my dataframe is such that the three green events are all around ~ +14 (the green events occur at ~ 15, and the first black event is at +1). So the above example would currently look something like this:

unique_id	chan_2_id	chan_2_timing	chan_1_timing	rel_timing
1	1	1	14.5	13.5
2	1	1	14.6	13.6
3	1	1	15	14
4	2	15	NA	NA
5	3	17	NA	NA

From the way I'm currently thinking about the data, they should actually be encoded as ~ -1. I hope that helps explain the crazy request

And finally, thank you for the code. It seems to work a treat! I'll mark it as a solution shortly.

nirgrahamuk · February 10, 2020, 4:53pm

great, thanks for the explain , appreciated.
There almost certainly is a faster solution if performance is an issue. probably an expert in data.table would do it easily. but I think my solution is easy to understand, and is probably alright for you if you have on the order of a few thousand events I would guess...

mrblobby · February 10, 2020, 5:11pm

Yes, I found it easy to see what was happening, so thank you! I actually have a few hundred thousand rows so I did have to make some modifications to make it more manageable. I split my data by another grouping variable, loop through the data and write the output to csvs. I also restricted the crossing to only adjacent channel 2 events since that's what I'm really interested in... All in all, it only takes 15 or so mins on my crappy laptop to run through so no complaints here!

system · February 17, 2020, 5:11pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.