Here is a possible solution using the R package dplyr for the merging part.
age <- c("0-7", "0-7", "0-7", "8-10", "8-10","8-10","11-15","11-15","11-15")
sex <- c("m","m","m", "f","f","f","both","both","both")
country <- c('Spain','North France','South France', 'Spain','North France','South France', 'Spain','North France','South France')
`X1999` <- c(21000, 23400, 26800, 21000, 23400, 26800, 21000, 23400, 26800)
`X2000` <- c(23444, 78330, 83920, 21000, 23400, 26800, 21000, 23400, 26800)
`X2001` <- c(21000, 23400, 26800, 21000, 23400, 26800, 21000, 23400, 26800)
pop.data <- data.frame(age, sex, country, X1999, X2000, X2001)
pop.data$country <- gsub("^(N.*France|S.*France)$", "France", pop.data$country, fixed = FALSE)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
pop.data %>%
group_by(age, sex, country) %>%
summarize(
X1999 = sum(X1999),
X2000 = sum(X2000),
X2001 = sum(X2001)
) %>%
ungroup() %>%
as.data.frame()
#> `summarise()` has grouped output by 'age', 'sex'. You can override using the `.groups` argument.
#> age sex country X1999 X2000 X2001
#> 1 0-7 m France 50200 162250 50200
#> 2 0-7 m Spain 21000 23444 21000
#> 3 11-15 both France 50200 50200 50200
#> 4 11-15 both Spain 21000 21000 21000
#> 5 8-10 f France 50200 50200 50200
#> 6 8-10 f Spain 21000 21000 21000
Created on 2021-01-22 by the reprex package (v0.3.0)
Note that I made two changes to your existing code:
- 1999 is not a valid variable name on its own, so I renamed it to X1999 as in your original data
- the regular expression did not work for the example you provided so I modified that part
To summarize multiple similar columns without writing them out one by one you can use the following syntax:
library(dplyr)
pop.data %>%
group_by(age, sex, country) %>%
summarize(across(X1999:X2020, sum), .groups = "drop") %>%
as.data.frame()