Count Number of Dates & Output Value

ImranJ · January 27, 2020, 5:21am

sample_data <- data.frame(
  Incident.Date = c('1/1/2015','1/12016','2/2/2015','2/2/2018','5/5/2016','4/4/2015','4/4/2017','4/4/2018','6/6/2018','5/5/2018','1/4/2015'))

I have a dataset that looks similar to this, I would like to create a script that counts the number of times "2016" appears in a date, 2017, 2018 and so on. Then will output how many times that occurred. Essentially I have a bunch of dates, that I would like to graph how many times that date occurred for this kind of dataset. Provided I have linked an example of what my dataset would look like.

FJCC · January 27, 2020, 5:28am

You can use summarize() from the dplyr package.

sample_data <- data.frame(
  Incident.Date = c('1/1/2015','1/1/2016','2/2/2015','2/2/2018','5/5/2016','4/4/2015','4/4/2017','4/4/2018','6/6/2018','5/5/2018','1/4/2015'))
Stats <- dplyr::summarize(sample_data, Y2015 = sum(grepl("2015", Incident.Date)),
                          Y2016 = sum(grepl("2016", Incident.Date)),
                          Y2017 = sum(grepl("2017", Incident.Date)),
                          Y2018 = sum(grepl("2018", Incident.Date)))
Stats
#>   Y2015 Y2016 Y2017 Y2018
#> 1     4     2     1     4

^{Created on 2020-01-26 by the reprex package (v0.3.0)}

ImranJ · January 27, 2020, 5:38am

This was exactly what I needed. Thanks

technocrat · January 27, 2020, 6:36am

Normally, I'd have suggested converting the Incident.date object to a dttm object, from which you can easily extract the year for summarization. However, the second entry is malformed: 1/12016

So, I'd use stringr to strip everything except the last four digits

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(stringr))
sample_data <- data.frame(Incident.Date = c('1/1/2015','1/12016','2/2/2015','2/2/2018','5/5/2016','4/4/2015','4/4/2017','4/4/2018','6/6/2018','5/5/2018','1/4/2015'))
sample_data %>% mutate(Incident.Date = str_extract(Incident.Date, "\\d{4}$")) %>% group_by(Incident.Date) %>% count()
#> # A tibble: 4 x 2
#> # Groups:   Incident.Date [4]
#>   Incident.Date     n
#>   <chr>         <int>
#> 1 2015              4
#> 2 2016              2
#> 3 2017              1
#> 4 2018              4

^{Created on 2020-01-26 by the reprex package (v0.3.0)}

system · February 17, 2020, 6:36am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.