Filtering for dates for new plot

I have an Excell file of COVID-19 cases for our county going back to January. I want to use the same Excel file but plot a new graph for only the month of June. D is how I labeled the column with all the dates. I'm very new to R and Ggplot so I don't know much about the correct commands. Here's what I've tried but neither work:

line_df <- COVID_ButteCounty_Master
filter (D = 2020-06-01 | D = 2020-06-23)
Error: unexpected '=' in " filter (D = 2020-06-01 | D ="

also

line_df <- COVID_ButteCounty_Master
filter(D >= as.Date("2020-06-01"))
Error in >=.default(D, as.Date("2020-06-01")) :
comparison (5) is possible only for atomic and list types

Are you using the pipe operator, %>%, in between the two lines of code provided? Second, in order to evaluate equivalence in R, two equal signs are needed. I also added quotes around the date as without quotes, 2020-06-01, will be evaluated as a mathematical formula.

Try the following:

line_df <- COVID_ButteCounty_Master %>%
filter (D == "2020-06-01" | D == "2020-06-23")

I added that to my script, but when I tried to plot it, it come out this way. Do you see what I might have done wrong?

line_df has 0 rows so that means there are no rows in COVID_ButteCounty_Master that pass the filter Date == "2020-06-01" | Date == "2020-06-23". Either those dates aren't in the the dataset or the filter is not in the right format. What are the values of Date? What is the class of Date? If it's a Date class, what is the format?

On the Excel file it's formatted "1/1/2020" but when I imported the Excel file, I classified Date as a date class, which converted it to: 2020-01-01

Cases is a numeric class.

Does that make sense?

Try the filter condition

filter (D == as.Date("2020-06-01") | D == as.Date("2020-06-23"))

So is the column name actually D or Date?

I changed it to Date. Originally it was D.

I'm guessing the dates in the filter are not actually in the database. Have you tried your code for a date you know is in the data? Can copy in the reply the output of dput(head(COVID_ButteCounty_Master))?

This worked! But it only plotted those two dates. How can I tell it to plot that entire range?

Try

filter (D >= as.Date("2020-06-01") & D <= as.Date("2020-06-23"))

Very cool! Thank you both!

1 Like

While I'm not using excel or referring to the specifics of your question, you might consider tapping a live data feed for your state and county:

library(dplyr);library(ggplot2)
library(RCurl)
usa<-getURL("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv")
usa<-read.csv(text=usa)
usa$date2<-as.Date(usa$date,format="%Y-%m-%d") # format date
usaS<-usa %>%
group_by(state,county,date2) %>%
summarise(cases=sum(cases), deaths=sum(deaths))

County Analyses

usaC<-usaS %>% filter(state=="California")
fcty<-c("Chico","Butte")

usaC %>% filter(county %in% fcty) %>% # subset in CA
filter(date2 > as.Date("2020-05-13")) %>% # filter by date
ggplot(., aes(date2,cases,fill=county)) +
geom_bar(stat='identity')+
theme_bw()

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.