Adding dates where there were no instances on those dates.

I have some data for false crawls (FA) on certain dates. On some days there were no false crawls such 22-05-08.

activity activity_date
FA 2022-05-06
FA 2022-05-07
FA 2022-05-09
FA 2022-05-15
FA 2022-05-16
FA 2022-05-16
FA 2022-05-16
FA 2022-05-16
FA 2022-05-16
FA 2022-05-17
FA 2022-05-17

If a count is performed such as

data_count <-data |> count(activity_date, name = ("crawls"))

the expected results are:

activity_date crawls
2022-05-06 1
2022-05-07 1
2022-05-09 1
2022-05-15 1
2022-05-16 5
2022-05-17 2

I want to be able to show that there were zero false crawls on the missing days such that a table such as this results:

activity_date crawls
2022-05-06 1
2022-05-07 1
2022-05-08 0
2022-05-09 1
2022-05-10 0
2022-05-11 0
2022-05-12 0
2022-05-13 0
2022-05-14 0
2022-05-15 1
2022-05-16 5
2022-05-17 2

How can I do this, add in those missing days for which there were no false crawls?

Thanks,
Jeff

Below is one way to achieve the desired output.

library(tidyverse)

data = tribble(
  ~activity, ~activity_date,
  'FA', '2022-05-06',
  'FA', '2022-05-07',
  'FA', '2022-05-09',
  'FA', '2022-05-15',
  'FA', '2022-05-16',
  'FA', '2022-05-16',
  'FA', '2022-05-16',
  'FA', '2022-05-16',
  'FA', '2022-05-16',
  'FA', '2022-05-17',
  'FA', '2022-05-17'
) |> 
  mutate(activity_date = as.Date(activity_date))

out = data.frame(activity_date = seq(min(data$activity_date), 
                                     max(data$activity_date), 
                                     'days')) %>%
  left_join(
    count(data, activity_date, name = 'crawls')
  ) %>%
  mutate(crawls = replace_na(crawls, 0))
#> Joining, by = "activity_date"

out
#>    activity_date crawls
#> 1     2022-05-06      1
#> 2     2022-05-07      1
#> 3     2022-05-08      0
#> 4     2022-05-09      1
#> 5     2022-05-10      0
#> 6     2022-05-11      0
#> 7     2022-05-12      0
#> 8     2022-05-13      0
#> 9     2022-05-14      0
#> 10    2022-05-15      1
#> 11    2022-05-16      5
#> 12    2022-05-17      2

Created on 2023-01-10 with reprex v2.0.2.9000

1 Like

Yes, that worked perfectly. I split out the left_join to understand what was going on. I used this approach to my full data set (115 observations) and applied cumsum of the counts to get a third column of what I wanted to plot.

Thanks much,
Jeff

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.