I like the solution with complete, but one could also go for an if else construct. Note that this can be dangerous, so I will point out several solutions involving base::if()...else(), dplyr::if_else() and base::ifelse().
The best one is base::if()... else()
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(tidyr))
population %>%
group_by(country) %>%
summarise(chg_2000_to_2012 = if(all(c(2000, 2012) %in% year)){
population[year == 2012]/population[year == 2000]
} else {
NA_real_
})
#> # A tibble: 219 x 2
#> country chg_2000_to_2012
#> <chr> <dbl>
#> 1 Afghanistan 1.4481192
#> 2 Albania 0.9567724
#> 3 Algeria 1.2131896
#> 4 American Samoa 0.9583811
#> 5 Andorra 1.1981835
#> 6 Angola 1.4951978
#> 7 Anguilla 1.2764881
#> 8 Antigua and Barbuda 1.1470869
#> 9 Argentina 1.1133743
#> 10 Armenia 0.9652101
#> # ... with 209 more rows
Usually one should use dplyr::if_else(). However, in this case it will result in an error (which is intended), since some of the countries don't contain a year 2000 or 2012 observation, as also mentioned within the question.
population %>%
group_by(country) %>%
summarise(chg_2000_to_2012 = if_else(all(c(2000, 2012) %in% year),
population[year == 2012]/population[year == 2000],
NA_real_))
#> Error in summarise_impl(.data, dots): Evaluation error: `true` must be length 1 (length of `condition`), not 0.
So it is really better to use the if() else() from above.
One might also like to come up with base::ifelse(), which works in this case (because the data looks good and there is maximum one observation per combination of country and year)
population %>%
group_by(country) %>%
summarise(chg_2000_to_2012 = ifelse(all(c(2000, 2012) %in% year),
population[year == 2012]/population[year == 2000],
NA_real_))
#> # A tibble: 219 x 2
#> country chg_2000_to_2012
#> <chr> <dbl>
#> 1 Afghanistan 1.4481192
#> 2 Albania 0.9567724
#> 3 Algeria 1.2131896
#> 4 American Samoa 0.9583811
#> 5 Andorra 1.1981835
#> 6 Angola 1.4951978
#> 7 Anguilla 1.2764881
#> 8 Antigua and Barbuda 1.1470869
#> 9 Argentina 1.1133743
#> 10 Armenia 0.9652101
#> # ... with 209 more rows
If we add a second observation for Afghanistan in 2012, we will end up with results that might not have been intended.
population_help <- tibble::tibble(country = "Afghanistan", year = 2012L, population = 139213L)
population2 <- bind_rows(population, population_help)
population2 %>% filter(country == "Afghanistan", year %in% c(2000, 2012))
#> # A tibble: 3 x 3
#> country year population
#> <chr> <int> <int>
#> 1 Afghanistan 2000 20595360
#> 2 Afghanistan 2012 29824536
#> 3 Afghanistan 2012 139213
If we run our code with base::ifelse(), it will work, just taking the first observation with year 2012 and forget about the second, which is not safe (at least it is almost arbitrary).
population2 %>%
group_by(country) %>%
summarise(chg_2000_to_2012 = ifelse(all(c(2000, 2012) %in% year),
population[year == 2012]/population[year == 2000],
NA_real_)) %>%
filter(country == "Afghanistan")
#> # A tibble: 1 x 2
#> country chg_2000_to_2012
#> <chr> <dbl>
#> 1 Afghanistan 1.448119
Whereas our initial if() else() construct will work perfectly. Both values are returned, and we will get an error within dplyr::summary(), since a summary should contain only one value.
population2 %>%
group_by(country) %>%
summarise(chg_2000_to_2012 = if(all(c(2000, 2012) %in% year)){
population[year == 2012]/population[year == 2000]
} else {
NA_real_
})
#> Error in summarise_impl(.data, dots): Column `chg_2000_to_2012` must be length 1 (a summary value), not 2