I'm looking for a way to assign weights to survey observations. I have a dataframe with survey data, which looks like this:
set.seed(123) survey <- data.frame(id=1:30, country = sample(letters[1:2], 30, replace = T), age = sample(c("young", "older", "old"), 30, replace = T), sex = sample(c("Male", "Female"), 30, replace = T))
I also have a census dataset, which looks like this:
census <- data.frame(country = rep(letters[1:2], each = 6), age = rep(rep(c("young", "older", "old"), each = 2), 2), sex = rep(c("Male", "Female"), 6), rel_freq = c(rep(.125, 4), rep(.25, 2), rep(.167, 6)))
Now, I'd like to calculate the hypothetical frequencies (how many I should have sampled) by multiplying the relative frequencies of the census times the number of observations for each country. I honestly don't know how to start.
OK, I do know how to start but I'm not sure if I'm on the right track.
This will give me the number of observations in each country:
survey %>% group_by(country) %>% nest() %>% mutate(country_pop=map_dbl(data, nrow)) %>% unnest(data)
I realize it's not clear what I'm looking for. The final dataset should look something like this:
id country age sex hyp_freq 1 1 a young Male 2.125 # 0.125*17 (0.125 from census data, 17 from number of observations of country a) 2 2 a older Male 2.125 # 0.125*17 3 3 a old Male 4.250 # 0.25*17 ...
So, I want to look up my survey respondent information (age and sex) in the other table to obtain the relative frequency for that age and sex group. Then I want to multiply that frequency by the number of observations per country and save it to my survey dataframe.