County-level plot has gap between the States

hta · May 10, 2023, 8:35pm

Hi everyone,

I am trying to plot the geographical distribution of wheat production using a county-level data set.

In order to create the data set, I combined a data set of county-level fips code with unique latitude-longitude for each fips code, with a data set of county-level wheat production.

I do not know why, it shows gaps between States, instead of having them connected (figure below)
wheat_production

library(maps)
g43 <- wheat_corn_insurance1 %>% filter(year==1989) %>% filter(wheat==1)%>%
        ggplot(aes(x = longitude, y = latitude, group = state, fill = stateproduction/1000)) + 
        geom_polygon(color = NA) + 
        scale_fill_gradient(low = "white", high = "red") + 
        labs(title = "County-Level Choropleth Map", 
        subtitle = "Wheat Production (thousand bu), 1989")

g43

I tried using the urbnmapr package as well, but I received a similar result.

Can someone help me with this?

technocrat · May 10, 2023, 9:21pm

Does every county grow wheat?

hta · May 10, 2023, 9:55pm

No it does not. In the ggplot command above, I have grouped data by State, and production is also at the State-level.

technocrat · May 10, 2023, 11:06pm

Well, that’s what it would look like if, for example, Elko County, NV is included. NA ≠ 0, is a possible reason. In fact, for spring wheat, for example, only a handful of counties had more than 5,000 acres of spring wheat planted in 2022.

hta · May 11, 2023, 3:59pm

Yes, thats true. But the ggplot command groups data at the state level, and also asks for plotting the production at the state level. With this set-up, I should not still expect to see well-connected states?

technocrat · May 11, 2023, 4:51pm

Oh, I see now. After enlarging the map. No borders, either between counties or at the state level.

This can be fixed most simply with the {sf} package, which uses a data frame with a column that handles spatial geometry. There’s a bit of a lift to install one of the system library dependencies, but you get very fine grained control.

hta · May 11, 2023, 9:34pm

Thank you for your guidance. Can you introduce a link/reference to the solution you suggested? I have worked with sf package for a project before.

technocrat · May 13, 2023, 9:49am

Here's an example. I set the estimate variable being used as fill to zero to emulate your situation. It does not knock out the states, although setting them to NA does.

library(ggplot2)
library(tidycensus)
library(tigris)
us_median_age <- get_acs(
  geography = "state",
  variables = "B01002_001",
  year = 2019,
  survey = "acs1",
  geometry = TRUE,
  resolution = "20m"
) |>
  shift_geometry()

ggplot(us_median_age,aes(fill = estimate)) +
  geom_sf() +
  scale_fill_viridis_c(alpha = 0.50, option="cividis")

# knock out some states

altered <- us_median_age

knocks <- altered$estimate > 38.3 | altered$estimate < 37.0

altered[knocks,"estimate"] <- 0

ggplot(altered,aes(fill = estimate)) +
  geom_sf() +
  scale_fill_viridis_c(alpha = 0.50, option="cividis")

hta · May 14, 2023, 11:53pm

Thank you so much for your help.

system · May 21, 2023, 11:54pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.