I have been thinking on how to replace nested loops with nested conditionals with map
but without success. I'm aware of the discussions on SO (https://stackoverflow.com/questions/48847613/purrr-map-equivalent-of-nested-for-loop and https://stackoverflow.com/questions/52031380/replacing-the-for-loop-by-the-map-function-to-speed-up?noredirect=1&lq=1) but neither of these proved to be useful for my case.
I have two dataset with different lenghts. For downstream purposes I want to include a unique group id from one dataset to the other. However, one dataset contains data from time periods (df_1
), the other is annual frequency (df_2
).
My solution so far is to loop over both dataset (the nested loops are neccesary due to the difference in lenghts) check if the countries are the same and within those countries check if the annual data falls between a specific period. If yes, than add the group id to the df_2
My problem with the map
approach (or *apply
for that matter) is that I don't know how to express the nested loop and the conditions together.
For a mwe see below.
library(dplyr)
library(tidyr)
# data
df_1 <- tibble(
start = rep(seq(1990, 1994, 4), each = 2),
end = start + 4,
countryname = rep(c("SWE", "NOR"), 2),
group_id = rep(seq(1:2), each = 2)
)
df_2 <- tibble(
year = rep(seq(1989, 1999, 1), each = 2),
countryname = rep(c("SWE", "NOR"), 11),
value = rep(seq(100, 110, 1), each = 2),
group_id = NA_real_
)
for (i in (1:nrow(df_1))) {
for (j in (1:nrow(df_2))) {
if (df_1[i, "countryname"] == df_2[j, "countryname"]) {
if (df_2[j, "year"] >= df_1[i, "start"] & df_2[j, "year"] <= df_1[i, "end"]) {
df_2[j, "group_id"] <- df_1[i, "group_id"]
}
}
}
}
Created on 2021-01-12 by the reprex package (v0.3.0)