Aggregating rows

Jakub_Komarek · July 17, 2020, 12:04pm

Hi,

I'd like to aggregate every 5 rows so that first column would be 0-4,5-9,10-14. Could you help me, please?

data.frame(
         age = c(0,1,2,3,4,5,6,7,8,9,10,11,12,
                 13,14,15,16,17,18,19),
        men = c(60494,61655,58727,55096,53160,50844,
                 48764,48387,47523,46884,45750,46555,46641,46770,49755,
                 55309,62419,63163,67208,67806),
       age = c(58115,58635,56453,51717,50493,47947,
                 46231,45756,44950,43674,43588,44029,44359,44394,47024,
                 52510,59468,59772,63587,64749)
```

srpike · July 17, 2020, 12:15pm

One option would be:

library(dplyr)
library(stringr)

data.frame(
  age = c(0,1,2,3,4,5,6,7,8,9,10,11,12,
          13,14,15,16,17,18,19),
  men = c(60494,61655,58727,55096,53160,50844,
          48764,48387,47523,46884,45750,46555,46641,46770,49755,
          55309,62419,63163,67208,67806),
  age2 = c(58115,58635,56453,51717,50493,47947,
          46231,45756,44950,43674,43588,44029,44359,44394,47024,
          52510,59468,59772,63587,64749)) %>% 
  mutate(index = floor((row_number()-1) / 5)) %>% 
  group_by(index) %>% 
  summarise(age = str_c(min(age), "-", max(age)), men = sum(men), age2 = sum(age2), .groups = "drop") %>% 
  select(-index)
#> # A tibble: 4 x 3
#>   age      men   age2
#>   <chr>  <dbl>  <dbl>
#> 1 0-4   289132 275413
#> 2 5-9   242402 228558
#> 3 10-14 235471 223394
#> 4 15-19 315905 300086

^{Created on 2020-07-17 by the reprex package (v0.3.0)}

Jakub_Komarek · July 17, 2020, 12:53pm

Thank you!

I have one more question. When I plot with ggplot, I get

5-9 is between 45 and 50.

Do you know how to solve it?

ggplot(pyramida2010, aes(y = vek, x = value, fill = variable)) + 
  geom_bar(data=subset(pyramida2010,variable=="ženy"), stat = "identity") + 
  geom_bar(data=subset(pyramida2010,variable=="muži"), stat = "identity") +
  theme_minimal()

srpike · July 17, 2020, 1:11pm

Try making the age label into a factor. I usually use forcats but I think base R factors would work too.

library(dplyr)
library(stringr)
library(forcats)

data.frame(
  age = c(0,1,2,3,4,5,6,7,8,9,10,11,12,
          13,14,15,16,17,18,19),
  men = c(60494,61655,58727,55096,53160,50844,
          48764,48387,47523,46884,45750,46555,46641,46770,49755,
          55309,62419,63163,67208,67806),
  age2 = c(58115,58635,56453,51717,50493,47947,
           46231,45756,44950,43674,43588,44029,44359,44394,47024,
           52510,59468,59772,63587,64749)) %>% 
  mutate(index = floor((row_number()-1) / 5)) %>% 
  group_by(index) %>% 
  summarise(age = str_c(min(age), "-", max(age)), men = sum(men), age2 = sum(age2), .groups = "drop") %>% 
  select(-index) %>% 
  mutate(age = as_factor(age))
#> # A tibble: 4 x 3
#>   age      men   age2
#>   <fct>  <dbl>  <dbl>
#> 1 0-4   289132 275413
#> 2 5-9   242402 228558
#> 3 10-14 235471 223394
#> 4 15-19 315905 300086

system · July 24, 2020, 1:16pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.