How to merge two datasets together when one set has multiple instances of something I want to merge

Hi @gkim65!

Preliminarily, see the Homework Policy -- I'm guessing from milestone-4.Rmd that this might be such.

Second, a reprex with data is really, really. Helpful. It's only by good luck that I found DEC_10_SF1_PCT7_with_ann.csv. And library(janitor) is needed to access clean_names. Not to mention dplyr to give you %>%

OK, enough preaching.

Your problem will be more tractable if you focus on the three variables needed to calculate population density by county

  • identifier for county
  • its area (or total population if you mean percentage of population classified as Korean)
  • the Korean population

99%+ plus of your population_korea data frame isn't needed for that. It has data for all population categories, a huge range of demographic characteristics, other than population, and the long/lat isn't needed unless you plan to do mapping.

population_korea$GEO.display.label contains county and state names. There's your identifier.

One of the HDxx-Sxxx contains total population by county/state and one contains the Korean population by county/state.

To figure out which, you'll need to do some digging. See the asc and tidycensus for resources to track those down.

Once you have those,

my_reduced_df <- population_korea %>% select(GEO.display.label, HDxx-Sxxx, HDyy-Syy)

If you do need mapping, the simplest way is an sf data frame with FIPS codes for county/states. You'll need to go back and line up your county/states in population_korea and then do an inner_join.

1 Like