Grouping columns and rows

I have a dataset in RStudio which has multiple columns with the headings, county, date, and cases total. In the county column, there are 54 counties that repeat every day starting from 3/9/2020 going until 1/21/2022.

i want to combine the cases total from Adams County on 3/9/2020 with the cases total from Adams County on 3/10/2020 and so on up to 1/21/2022. I want to do this for all 54 counties.

COUNTY Date Cases Total
Adams 3/9/2020 0
Barnes 3/9/2020 0
Benson 3/9/2020 0
Billings 3/9/2020 0
Bottineau 3/9/2020 0

I have tried using group_by and summarize but it seems to group some of the counties together and not all of them.

Any help would be appreciated.

Here's an example with contrived data. If it doesn't work on your actual data, it could be due to malformatted entries.

suppressPackageStartupMessages({
  library(dplyr)
})

nd <- c("Adams","Barnes","Benson","Billings","Bottineau","Bowman","Burke","Burleigh","Cass","Cavalier","Dickey","Divide","Dunn","Eddy","Emmons","Foster","Golden Valley","Grand Forks","Grant","Griggs","Hettinger","Kidder","LaMoure","Logan","McHenry","McIntosh","McKenzie","McLean","Mercer","Morton","Mountrail","Nelson","Oliver","Pembina","Pierce","Ramsey","Ransom","Renville","Richland","Rolette","Sargent","Sheridan","Sioux","Slope","Stark","Steele","Stutsman","Towner","Traill","Walsh","Ward","Wells","Williams")

counties <- rep(nd,3)
day1 <- rep(as.Date("2020/3/9"),length(nd))
day2 <- rep(as.Date("2020/3/10"),length(nd))
day3 <- rep(as.Date("2020/3/11"),length(nd))
dates <- c(day1,day2,day3)

set.seed(42)
counts <- sample(1:20,length(nd),replace = TRUE)

dat <- data.frame(county = counties, date = dates, count = counts)

dat %>% group_by(county) %>% summarise(cases = sum(count)) %>% print(., n = Inf)
#> # A tibble: 53 × 2
#>    county        cases
#>    <chr>         <int>
#>  1 Adams            51
#>  2 Barnes           15
#>  3 Benson            3
#>  4 Billings         30
#>  5 Bottineau        12
#>  6 Bowman           54
#>  7 Burke            51
#>  8 Burleigh         45
#>  9 Cass             21
#> 10 Cavalier         12
#> 11 Dickey           15
#> 12 Divide           42
#> 13 Dunn             60
#> 14 Eddy             54
#> 15 Emmons           45
#> 16 Foster            9
#> 17 Golden Valley    27
#> 18 Grand Forks      12
#> 19 Grant            15
#> 20 Griggs           39
#> 21 Hettinger        15
#> 22 Kidder           60
#> 23 LaMoure           6
#> 24 Logan            24
#> 25 McHenry           9
#> 26 McIntosh          3
#> 27 McKenzie         30
#> 28 McLean           33
#> 29 Mercer           45
#> 30 Morton           24
#> 31 Mountrail        12
#> 32 Nelson           12
#> 33 Oliver           54
#> 34 Pembina          39
#> 35 Pierce           15
#> 36 Ramsey           12
#> 37 Ransom            6
#> 38 Renville         54
#> 39 Richland          9
#> 40 Rolette          51
#> 41 Sargent          54
#> 42 Sheridan         18
#> 43 Sioux            18
#> 44 Slope             6
#> 45 Stark            60
#> 46 Steele            9
#> 47 Stutsman          6
#> 48 Towner           18
#> 49 Traill           30
#> 50 Walsh            24
#> 51 Ward             15
#> 52 Wells             3
#> 53 Williams         51
1 Like

Thanks, I'll start with this.
I have over 600 days will I have to do day1, day2 for all 600?

No, those were just to create the synthetic dataset to illustrate the use of group_by() and summarize.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.