Calculating a two week difference in new cases based on a yes/no condition.

Hello! I am new to R and I am working on a project that requires me to calculated the difference in new cases two weeks from the date that a mask mandate was enforced. My dataset consist of state, date, mandate (yes/no), daily new cases. Therefore, if mandate = yes, then I need to calculated the difference in new cases two weeks from the date that the mandate was placed. I am not sure how to code this, much help will be appreciated.

Best,
xbechtel

What have you tried? Can you provide a reproducible example?

This is the output

structure(list(mask = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_), date = structure(c(18327, 
18328, 18329, 18330, 18331, 18332), class = "Date"), state = c("AL", 
"AL", "AL", "AL", "AL", "AL"), new_case = c(NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_)), row.names = c(NA, -6L
), groups = structure(list(state = "AL", .rows = structure(list(
    1:6), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", 
"list"))), row.names = 1L, class = c("tbl_df", "tbl", "data.frame"
), .drop = TRUE), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
))

For there first 13 days, there were no new cases

There are only six days in that code with no cases.

> masks_df
# A tibble: 6 x 4
# Groups:   state [1]
  mask  date       state new_case
  <chr> <date>     <chr>    <dbl>
1 NA    2020-03-06 AL          NA
2 NA    2020-03-07 AL          NA
3 NA    2020-03-08 AL          NA
4 NA    2020-03-09 AL          NA
5 NA    2020-03-10 AL          NA
6 NA    2020-03-11 AL          NA

Can you provide the data for the dates that you had above (July)?

dput(head(masks_df, n = 20))

structure(list(mask = c(NA_character_, NA_character_, NA_character_, 
NA_character_, "Yes", NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), date = structure(c(18334, 18335, 
18336, 18337, 18338, 18339, 18340, 18341, 18342, 18343, 18344, 
18345, 18346, 18347, 18348, 18349, 18350, 18351, 18352, 18353
), class = "Date"), state = c("AL", "AL", "AL", "AL", "AL", "AL", 
"AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
"AL", "AL", "AL"), new_case = c(1, 5, 6, 16, 8, 10, 22, 13, 43, 
14, 29, 48, 68, 223, 81, 109, 110, 53, 122, 96)), row.names = c(NA, 
-20L), groups = structure(list(state = "AL", .rows = structure(list(
    1:20), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", 
"list"))), row.names = 1L, class = c("tbl_df", "tbl", "data.frame"
), .drop = TRUE), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
))

Does this work, it has the first 20 observations?

What are you after? When there is a 'Yes' e.g. 2020-03-17, what should there be?

library(tidyverse)

masks_df %>% 
  mutate(two_weeks_ago = lag(new_case, 14)) %>% 
  mutate(mask_mandate_14 = lag(mask, 14)) %>% 
  mutate(diff = two_weeks_ago - new_case)
# A tibble: 20 x 7
# Groups:   state [1]
   mask  date       state new_case two_weeks_ago mask_mandate_14  diff
   <chr> <date>     <chr>    <dbl>         <dbl> <chr>           <dbl>
 1 NA    2020-03-13 AL           1            NA NA                 NA
 2 NA    2020-03-14 AL           5            NA NA                 NA
 3 NA    2020-03-15 AL           6            NA NA                 NA
 4 NA    2020-03-16 AL          16            NA NA                 NA
 5 Yes   2020-03-17 AL           8            NA NA                 NA
 6 NA    2020-03-18 AL          10            NA NA                 NA
 7 NA    2020-03-19 AL          22            NA NA                 NA
 8 NA    2020-03-20 AL          13            NA NA                 NA
 9 NA    2020-03-21 AL          43            NA NA                 NA
10 NA    2020-03-22 AL          14            NA NA                 NA
11 NA    2020-03-23 AL          29            NA NA                 NA
12 NA    2020-03-24 AL          48            NA NA                 NA
13 NA    2020-03-25 AL          68            NA NA                 NA
14 NA    2020-03-26 AL         223            NA NA                 NA
15 NA    2020-03-27 AL          81             1 NA                -80
16 NA    2020-03-28 AL         109             5 NA               -104
17 NA    2020-03-29 AL         110             6 NA               -104
18 NA    2020-03-30 AL          53            16 NA                -37
19 NA    2020-03-31 AL         122             8 Yes              -114
20 NA    2020-04-01 AL          96            10 NA                -86

I would like to calculate the difference between new cases two weeks from the date when there is a "yes" for mask (when a mask mandate was enforced). Therefore, it should only be one number. If mask is yes, calculate the difference in new cases from the date two weeks from when the mandate was enforced.

Like this?

library(tidyverse)

masks_df %>% 
  mutate(in_two_weeks_mask = if_else(mask == "Yes", lead(new_case, 14), NA_real_)) %>% 
  mutate(diff = in_two_weeks_mask - new_case)
# A tibble: 20 x 6
# Groups:   state [1]
   mask  date       state new_case in_two_weeks_mask  diff
   <chr> <date>     <chr>    <dbl>             <dbl> <dbl>
 1 NA    2020-03-13 AL           1                NA    NA
 2 NA    2020-03-14 AL           5                NA    NA
 3 NA    2020-03-15 AL           6                NA    NA
 4 NA    2020-03-16 AL          16                NA    NA
 5 Yes   2020-03-17 AL           8               122   114
 6 NA    2020-03-18 AL          10                NA    NA
 7 NA    2020-03-19 AL          22                NA    NA
 8 NA    2020-03-20 AL          13                NA    NA
 9 NA    2020-03-21 AL          43                NA    NA
10 NA    2020-03-22 AL          14                NA    NA
11 NA    2020-03-23 AL          29                NA    NA
12 NA    2020-03-24 AL          48                NA    NA
13 NA    2020-03-25 AL          68                NA    NA
14 NA    2020-03-26 AL         223                NA    NA
15 NA    2020-03-27 AL          81                NA    NA
16 NA    2020-03-28 AL         109                NA    NA
17 NA    2020-03-29 AL         110                NA    NA
18 NA    2020-03-30 AL          53                NA    NA
19 NA    2020-03-31 AL         122                NA    NA
20 NA    2020-04-01 AL          96                NA    NA
1 Like

Yes! Thank you so much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

A portion of my dataset looks like this

masks_df = read_excel("./data/mask_state.xlsx") %>% 
  janitor::clean_names() %>% 
  mutate(date = as.Date(as.numeric(as.character(date)), origin="1899-12-30")) %>% 
  select(-x4) 

  Mask date       state    new_case
  NA   2020-07-15  AL    1236
  NA   2020-07-16  AL    1254
  NA   2020-07-17  AL    1579
  Yes  2020-07-18  AL    2003
  NA   2020-07-19  AL    1745
  NA   2020-07-20  AL    1945
  NA   2020-07-21  AL    2114
  NA   2020-07-22  AL    1819
  NA   2020-07-23  AL    1126
  NA   2020-07-24  AL    1282

The dataset is grouped by state and there is a date for everyday beginning from 2020-01-20. I wasn't sure if it would be best to use a function or mutate from the dplyr package.

can you provide the output of dput(masks_df) or dput(head(masks_df))?