Hi,
Does anyone know of a "tidy" way (or even a non-tidy, yet efficient way) to count the number of occurrences within a specified distance?
For example, imagine 15 people entered my store over a period of 10 minutes, and I tracked which minute they entered, I could have a table looking like
person_id | minute |
---|---|
1 | 1 |
2 | 1 |
3 | 2 |
4 | 3 |
5 | 3 |
6 | 4 |
7 | 5 |
8 | 6 |
9 | 7 |
10 | 7 |
11 | 7 |
12 | 8 |
13 | 8 |
14 | 9 |
15 | 10 |
Now, say I wanted to add a column for each person, to count how many other people entered the store within 2 minutes of them. Is there a straightforward/tidy way to do it?
I can think of a complex way to calculate this, but it seems to not be scalable to larger data sets...
library(tidyverse)
#> -- Attaching packages ---------------------------------------------------------------------- tidyverse 1.2.1 --
#> v ggplot2 2.2.1.9000 v purrr 0.2.4
#> v tibble 1.4.2 v dplyr 0.7.4
#> v tidyr 0.8.0 v stringr 1.3.0
#> v readr 1.1.1 v forcats 0.3.0
#> -- Conflicts ------------------------------------------------------------------------- tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
df1 <- tibble::tribble(
~person_id, ~minute,
1L, 1L,
2L, 1L,
3L, 2L,
4L, 3L,
5L, 3L,
6L, 4L,
7L, 5L,
8L, 6L,
9L, 7L,
10L, 7L,
11L, 7L,
12L, 8L,
13L, 8L,
14L, 9L,
15L, 10L
)
crossing(df1, df1) %>%
mutate(time_diff = abs(minute - minute1)) %>%
group_by(person_id, minute) %>%
summarize(ppl_wi_2_min = sum(time_diff <= 2)) %>%
ungroup()
#> # A tibble: 15 x 3
#> person_id minute ppl_wi_2_min
#> <int> <int> <int>
#> 1 1 1 5
#> 2 2 1 5
#> 3 3 2 6
#> 4 4 3 7
#> 5 5 3 7
#> 6 6 4 6
#> 7 7 5 8
#> 8 8 6 8
#> 9 9 7 8
#> 10 10 7 8
#> 11 11 7 8
#> 12 12 8 8
#> 13 13 8 8
#> 14 14 9 7
#> 15 15 10 4
Does anyone know of a better way (i.e. one that would not involve crossing
or expand.grid
or something like that)?
Thanks!