How do we select a range of dates in a column?

I am trying to filter out a range of dates from covid_au_state.csv dataset. I used filter() and between() functions but not able to apply them. Is there any other way?

  covid <- read.csv(file = 'covid_au_state.csv')
  dput(covid)
  library(lubridate)
  library(dplyr)
  library(ggplot2)
  filt1 <- between(covid$date, '17/3/20', '16/8/20')
  data1 <- filter(covid, date < dmy("17/3/2020"))

You might want to check they are being recognized as valid dates first. What does class(covid$date) return?

I would try converting the date before your filtering steps:

  covid <- read.csv(file = 'covid_au_state.csv')
  dput(covid)
  library(lubridate)
  library(dplyr)
  library(ggplot2)
  covid$date <- as.Date(covid$date,'%d/%m/%y') 
  filt1 <- between(covid$date, '2020-03-17', '2020-08-16')
  data1 <- filter(covid, date <  '2020-03-17')

I also made an adjustment to the format of the dates on the filters because I believe once the date gets converted you'd want to use that format instead.

If that doesn't work I'd be happy to help troubleshoot more, but having a reproducible example would make things a lot easier, you can find out how to make one here: FAQ: What's a reproducible example (`reprex`) and how do I do one?

1 Like

I tried this code but it didn't worked.

  covid <- read.csv(file = 'covid_au_state.csv')
  dput(covid)
  library(lubridate)
  library(dplyr)
  library(ggplot2)
  covid$date <- as.Date(covid$date,'%d/%m/%y') 
  filt1 <- covid %>% 
  filter(between(covid$date, as.Date("2020-03-17", format = "%Y/%m/%d"),  
  as.Date("2020-08-16", format = "%Y/%m/%d")))
  View(filt1)

What actually I am trying to do here is compare the daily growth factor across all the eight states in Australia from 17 March 2020 to 16 August 2020. (The growth factor is calculated by dividing the new cases on the current day with the new cases of the previous day.)

This is the inner structure of the file.

str(covid)
'data.frame':	1640 obs. of  13 variables:
 $ date         : Date, format: "2020-01-25" "2020-01-25" ...
 $ state        : chr  "Australian Capital Territory" "New South Wales" "Northern Territory" 
"Queensland" ...
 $ state_abbrev : chr  "ACT" "NSW" "NT" "QLD" ...
 $ confirmed    : int  0 3 0 0 0 0 1 0 0 0 ...
 $ confirmed_cum: int  0 3 0 0 0 0 1 0 0 3 ...
 $ deaths       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ deaths_cum   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ tests        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ tests_cum    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ positives    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ positives_cum: int  0 0 0 0 0 0 0 0 0 0 ...
 $ recovered    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ recovered_cum: int  0 0 0 0 0 0 0 0 0 0 ...

Can someone please help?

Sorry I just saw this. There is a mistake formatting the date, it should have been "%Y-%m-%d" based on the format you wrote it in.

Here is the code you had with that adjustment:

  covid <- read.csv(file = 'covid_au_state.csv')
  dput(covid)
  library(lubridate)
  library(dplyr)
  library(ggplot2)
  covid$date <- as.Date(covid$date,'%d/%m/%y') 
  filt1 <- covid %>% 
  filter(between(covid$date, as.Date("2020-03-17", format = "%Y-%m-%d"),  
  as.Date("2020-08-16", format = "%Y-%m-%d")))
  View(filt1)

You also don't need to specify the format in this case, so you can remove the whole piece with as.Date() and format, like so:

  filter(between(covid$date, "2020-03-17",  "2020-08-16"))

If you wanted to get the daily growth factor across all the eight states, here is how I would do it:

growth_factor <- filt1 %>% 
                    group_by(state_abbrev) %>%
                    mutate(previous_day_confirmed = lag(confirmed, n=1, default=NA, order_by = date)) %>%
                    mutate(growth_factor = confirmed/previous_day_confirmed)                

The first mutate step gets the confirmed cases from the previous day, and the second mutate step calculates the growth factor as confirmed/previous_day_confirmed.

In the code above I used the variable confirmed as "new cases", so you'll have to adjust that if you meant a different variable instead of confirmed. I also can't test the code without making an example dataset, if you could create a reproducible example using the reprex package I could help more with the growth factor calculation (look at "Scenario 2"): https://reprex.tidyverse.org/articles/articles/datapasta-reprex.html

Sorry for the delay, didn't see this until now!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.