I am new to R and not sure if this question will be naive. What I want to achieve in my data is firstly do the matching based on age, gender and highest degree (using left_join in dplyr). Then, as there are some variables that don't have a corresponding value in the right data frame, for those data that "p_id == NA", we want to do the matching based on only age and gender attributes. You can find below my code (which is not working to achieve the function I describe). I am wondering if you know why I am getting the warning "the condition has length >1 and only the first element will be used"? If so, can you offer me some hints to achieve the function I wish to do? Thanks in advance
matching_function <- function(demobel_df, monitor_data){
matching_df <- monitor_data %>% select(p_id, AgeGroupMethodology, Gender, AgeExact, HighestDegreeCat, WegingPop)
demobel_matched <- left_join(demobel_df, matching_df, by = c("ageGroup" = "AgeGroupMethodology", "genderN" = "Gender", "degree" = "HighestDegreeCat"))
if (is.na(demobel_matched$p_id)) {
demobel_matched <- left_join(demobel_matched, matching_df, by = c("ageGroup" = "AgeGroupMethodology", "genderN" = "Gender"))
} else {
demobel_matched <- demobel_matched
}
demobel_matched$ageDiff <- abs(demobel_matched$age - demobel_matched$AgeExact)
#Order first according to id, then weight, then ageDiff
demobel_matched<- demobel_matched[order(demobel_matched[,'personID'],demobel_matched[,'WegingPop'], demobel_matched[,'ageDiff']),]
return(demobel_matched)
}
demobel_matched <- matching_function(demobel_adults, adult_individuals)
However, the p_id is generated after the left_bind so your code is not working for my case. But thanks to your tips/hints, I have got the solution and I will share my code here
#Starting of the matching demoBel data to the MONITOR data for predicting individual activity chain
#Get the important variables for matching from the MONITOR data
matching_df <- adult_individuals %>% select(p_id, AgeGroupMethodology, Gender, AgeExact, HighestDegreeCat, WegingPop)
#Do the matching based on age, gender and the highest degree
demobel_matched <- left_join(demobel_adults, matching_df, by = c("ageGroup" = "AgeGroupMethodology", "genderN" = "Gender", "degree" = "HighestDegreeCat"))
#Most of the matchings are done, however, there are cases that "age + gender + highestDegree" combination are not available in the MONITOR data
#Hence, first filter out these individuals based on whether p_id is NA, then, delete the p_id, ageExact and WegingPop column that additionally added by previous left_join
demobel_unmatched <- demobel_matched %>%
filter(is.na(demobel_matched$p_id) == TRUE) %>%
select(-c(p_id, AgeExact, WegingPop))
#Do the same matching based on the age and gender combination
demobel_unmatched <- left_join(demobel_unmatched, matching_df, by = c("ageGroup" = "AgeGroupMethodology", "genderN" = "Gender"))
#Delete the additional HighestDrgree column added because of the left_join of two variables
demobel_unmatched <- demobel_unmatched %>%
select(-HighestDegreeCat)
#Bind the two tables into one table
demobel_matched <- rbind(demobel_matched, demobel_unmatched)