I'm looking for a way to assign weights to survey observations. I have a dataframe with survey data, which looks like this:
set.seed(123)
survey <- data.frame(id=1:30,
country = sample(letters[1:2], 30, replace = T),
age = sample(c("young", "older", "old"), 30, replace = T),
sex = sample(c("Male", "Female"), 30, replace = T))
I also have a census dataset, which looks like this:
census <- data.frame(country = rep(letters[1:2], each = 6),
age = rep(rep(c("young", "older", "old"), each = 2), 2),
sex = rep(c("Male", "Female"), 6),
rel_freq = c(rep(.125, 4), rep(.25, 2), rep(.167, 6)))
Now, I'd like to calculate the hypothetical frequencies (how many I should have sampled) by multiplying the relative frequencies of the census times the number of observations for each country. I honestly don't know how to start.
Update:
OK, I do know how to start but I'm not sure if I'm on the right track.
This will give me the number of observations in each country:
survey %>%
group_by(country) %>%
nest() %>%
mutate(country_pop=map_dbl(data, nrow)) %>%
unnest(data)
Update2:
I realize it's not clear what I'm looking for. The final dataset should look something like this:
id country age sex hyp_freq
1 1 a young Male 2.125 # 0.125*17 (0.125 from census data, 17 from number of observations of country a)
2 2 a older Male 2.125 # 0.125*17
3 3 a old Male 4.250 # 0.25*17
...
So, I want to look up my survey respondent information (age and sex) in the other table to obtain the relative frequency for that age and sex group. Then I want to multiply that frequency by the number of observations per country and save it to my survey dataframe.