Hi @gkim65!
Preliminarily, see the Homework Policy -- I'm guessing from milestone-4.Rmd that this might be such.
Second, a reprex with data is really, really. Helpful. It's only by good luck that I found DEC_10_SF1_PCT7_with_ann.csv. And library(janitor)
is needed to access clean_names.
Not to mention dplyr
to give you %>%
OK, enough preaching.
Your problem will be more tractable if you focus on the three variables needed to calculate population density by county
- identifier for county
- its area (or total population if you mean percentage of population classified as Korean)
- the Korean population
99%+ plus of your population_korea
data frame isn't needed for that. It has data for all population categories, a huge range of demographic characteristics, other than population, and the long/lat isn't needed unless you plan to do mapping.
population_korea$GEO.display.label
contains county and state names. There's your identifier.
One of the HDxx-Sxxx contains total population by county/state and one contains the Korean population by county/state.
To figure out which, you'll need to do some digging. See the asc
and tidycensus
for resources to track those down.
Once you have those,
my_reduced_df <- population_korea %>% select(GEO.display.label, HDxx-Sxxx, HDyy-Syy)
If you do need mapping, the simplest way is an sf
data frame with FIPS codes for county/states. You'll need to go back and line up your county/states in population_korea and then do an inner_join.