Starting Graph After Value of 100

I have made a graph using an online data set to plot data. Currently the graph is created by the following script. The varible Cases is what I am using to plot the data by DateRep. I would like for my graph to start after the DateRep value reaches a value of 100. I know that I could just create a subset for after the data reaches 100, however I am looking to see if this can all be done inside 1 function.

Here is a link to the dataset that I am using

data %>% 
  filter(`Countries and territories` %in% c("Canada")) %>% 
  ggplot(aes(DateRep, Cases, col = `Countries and territories`)) + 
  #geom_point() +
  #geom_line() +
  scale_y_log10() +
  geom_smooth() +
  theme(legend.position = "bottom") +
  ggtitle("Plot of COVID-19 Cases In Canada") +
  xlab("Date In Months") + 
  #ylab("Cases Per Day Count") 
ylab("Cases Per Day Count - Log10 Applied")  

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

  1. If you have stored the data set in some R object, dput function is very handy.

  2. In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

I took a look at my dataset, I was thinking that I have this piece of code. This is what sets what countries I want to look at and plots Countries on one axis, and cases on another axis. As-well, in this dataset we have a variable called deaths. What I would like to do is tally up all the deaths values, and then start the building of my graph, (Plotting the cases by date) after the death toll for that country reaches a value of 100. that's when I would like my plotting to occur . I can upload a .csv file of the dataset.

  filter(`Countries and territories` %in% c("Canada", "United_States_of_America", "Iran", "Italy")) %>% 
  ggplot(aes(DateRep, Cases, col = `Countries and territories`)) + 

Is this what you mean?

library(tidyverse)
library(lubridate)

url <- "https://opendata.ecdc.europa.eu/covid19/casedistribution/csv"
sample_df <- read.csv(url)

sample_df %>%
    mutate(dateRep = dmy(dateRep)) %>% 
    filter(countriesAndTerritories %in% c("Canada", "United_States_of_America", "Iran", "Italy")) %>%
    arrange(countriesAndTerritories, dateRep) %>% 
    group_by(countriesAndTerritories) %>% 
    mutate(cum_deaths = cumsum(deaths)) %>% 
    filter(cum_deaths >= 100) %>% 
    ggplot(aes(dateRep, cases, colour = countriesAndTerritories)) +
    geom_point() +
    geom_smooth()

Created on 2020-03-26 by the reprex package (v0.3.0.9001)

If this is not what you mean, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

I believe this may be a solution, however can you explain what is going on in these two lines.

    mutate(cum_deaths = cumsum(deaths)) %>% 
    filter(cum_deaths >= 100) %>% 

This calculates the cumulative deaths by country and then filters cases with more than 100 accumulated deaths, that is what I understood from your explanation.

Yes, thank you. However when I try to run your provided code I get an error. I will have to explore this solution. This is the error that I get.

Error in lapply(list(...), .num_to_date) : object 'dateRep' not found

Would there be something with a similar syntax to

  filter(`Deaths` >= 100) %in% c("Canada", "United_States_of_America", "Iran", "Italy")) %>% 

I have downloaded the data from a csv file and column names seem to be different than the ones you have, check on that.

Sorry, but this doesn't make sense to me, it doesn't even have valid syntax.

Hello, I keep trying to recreate your graph from the after 100 mark and it still does not work, may you try recreating it based off of this code where you are pulling the code from the URL just like in mine. In the second and third line I am implementing your code of how to bring it to >= 100.

#Read In The Files
install.packages("readxl")

#these libraries are necessary
library(readxl)
library(httr)
library(tidyverse)

#create the URL where the dataset is stored with automatic updates every day
url <- paste("https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide-",format(Sys.time(), "%Y-%m-%d"), ".xlsx", sep = "")

#download the dataset from the website to a local temporary file
GET(url, authenticate(":", ":", type="ntlm"), write_disk(tf <- tempfile(fileext = ".xlsx")))

#read the Dataset sheet into “R”
data <- read_excel(tf)
data %>% 
  filter(countriesAndTerritories %in% c("Canada")) %>% 
  mutate(totalDeathds = cumsum(deaths)) %>% 
  filter(totalDeathds >= 100) %>% 
  ggplot(aes(dateRep, cases, col = countriesAndTerritories)) + 
  geom_point() +
  geom_line() +
  scale_y_log10() +
  geom_smooth() +
  #abline(v = mean(Canada_Subset$Cases),col="red", lwd=3, lty=2) +
  theme(legend.position = "bottom") +
  ggtitle("Plot of COVID-19 Cases In Majority Countries") +
  xlab("Date In Months") + 
  ylab("Cases Per Day Count") +
#ylab("Cases Per Day Count - Log10 Applied") 

This code produces no output since Canada doesn't have more than 100 deaths

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.