How to merge two datasets together when one set has multiple instances of something I want to merge

technocrat · October 31, 2019, 3:26am

Preliminarily, see the Homework Policy -- I'm guessing from milestone-4.Rmd that this might be such.

Second, a reprex with data is really, really. Helpful. It's only by good luck that I found DEC_10_SF1_PCT7_with_ann.csv. And library(janitor) is needed to access clean_names. Not to mention dplyr to give you %>%

OK, enough preaching.

Your problem will be more tractable if you focus on the three variables needed to calculate population density by county

identifier for county
its area (or total population if you mean percentage of population classified as Korean)
the Korean population

99%+ plus of your population_korea data frame isn't needed for that. It has data for all population categories, a huge range of demographic characteristics, other than population, and the long/lat isn't needed unless you plan to do mapping.

population_korea$GEO.display.label contains county and state names. There's your identifier.

One of the HDxx-Sxxx contains total population by county/state and one contains the Korean population by county/state.

To figure out which, you'll need to do some digging. See the asc and tidycensus for resources to track those down.

Once you have those,

my_reduced_df <- population_korea %>% select(GEO.display.label, HDxx-Sxxx, HDyy-Syy)

If you do need mapping, the simplest way is an sf data frame with FIPS codes for county/states. You'll need to go back and line up your county/states in population_korea and then do an inner_join.