Grouping columns and rows

aroberthumes · January 22, 2022, 5:44am

I have a dataset in RStudio which has multiple columns with the headings, county, date, and cases total. In the county column, there are 54 counties that repeat every day starting from 3/9/2020 going until 1/21/2022.

i want to combine the cases total from Adams County on 3/9/2020 with the cases total from Adams County on 3/10/2020 and so on up to 1/21/2022. I want to do this for all 54 counties.

COUNTY	Date	Cases Total
Adams	3/9/2020	0
Barnes	3/9/2020	0
Benson	3/9/2020	0
Billings	3/9/2020	0
Bottineau	3/9/2020	0

I have tried using group_by and summarize but it seems to group some of the counties together and not all of them.

Any help would be appreciated.

technocrat · January 22, 2022, 6:22am

Here's an example with contrived data. If it doesn't work on your actual data, it could be due to malformatted entries.

suppressPackageStartupMessages({
  library(dplyr)
})

nd <- c("Adams","Barnes","Benson","Billings","Bottineau","Bowman","Burke","Burleigh","Cass","Cavalier","Dickey","Divide","Dunn","Eddy","Emmons","Foster","Golden Valley","Grand Forks","Grant","Griggs","Hettinger","Kidder","LaMoure","Logan","McHenry","McIntosh","McKenzie","McLean","Mercer","Morton","Mountrail","Nelson","Oliver","Pembina","Pierce","Ramsey","Ransom","Renville","Richland","Rolette","Sargent","Sheridan","Sioux","Slope","Stark","Steele","Stutsman","Towner","Traill","Walsh","Ward","Wells","Williams")

counties <- rep(nd,3)
day1 <- rep(as.Date("2020/3/9"),length(nd))
day2 <- rep(as.Date("2020/3/10"),length(nd))
day3 <- rep(as.Date("2020/3/11"),length(nd))
dates <- c(day1,day2,day3)

set.seed(42)
counts <- sample(1:20,length(nd),replace = TRUE)

dat <- data.frame(county = counties, date = dates, count = counts)

dat %>% group_by(county) %>% summarise(cases = sum(count)) %>% print(., n = Inf)
#> # A tibble: 53 × 2
#>    county        cases
#>    <chr>         <int>
#>  1 Adams            51
#>  2 Barnes           15
#>  3 Benson            3
#>  4 Billings         30
#>  5 Bottineau        12
#>  6 Bowman           54
#>  7 Burke            51
#>  8 Burleigh         45
#>  9 Cass             21
#> 10 Cavalier         12
#> 11 Dickey           15
#> 12 Divide           42
#> 13 Dunn             60
#> 14 Eddy             54
#> 15 Emmons           45
#> 16 Foster            9
#> 17 Golden Valley    27
#> 18 Grand Forks      12
#> 19 Grant            15
#> 20 Griggs           39
#> 21 Hettinger        15
#> 22 Kidder           60
#> 23 LaMoure           6
#> 24 Logan            24
#> 25 McHenry           9
#> 26 McIntosh          3
#> 27 McKenzie         30
#> 28 McLean           33
#> 29 Mercer           45
#> 30 Morton           24
#> 31 Mountrail        12
#> 32 Nelson           12
#> 33 Oliver           54
#> 34 Pembina          39
#> 35 Pierce           15
#> 36 Ramsey           12
#> 37 Ransom            6
#> 38 Renville         54
#> 39 Richland          9
#> 40 Rolette          51
#> 41 Sargent          54
#> 42 Sheridan         18
#> 43 Sioux            18
#> 44 Slope             6
#> 45 Stark            60
#> 46 Steele            9
#> 47 Stutsman          6
#> 48 Towner           18
#> 49 Traill           30
#> 50 Walsh            24
#> 51 Ward             15
#> 52 Wells             3
#> 53 Williams         51

aroberthumes · January 22, 2022, 6:37am

Thanks, I'll start with this.
I have over 600 days will I have to do day1, day2 for all 600?

technocrat · January 22, 2022, 6:39am

No, those were just to create the synthetic dataset to illustrate the use of group_by() and summarize.

system · February 12, 2022, 6:39am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.