It's time to learn the sf successor to the sp package. It's huge advantage is that in many ways it works like a data frame, with the facilities that come with the tidyverse. It adds a column "geometry" with spatial objects such as lines, points, polygons, multipolygons. This would enable you to have a tibble of observation statement identifiers, observations, the associated geo_data, their centroids and an ability with some sweatful wrangling to convert the centroids POINTS into a matrix of lat,long coordinates, and bring back into your main dataset.
I just spent three days doing this for thematic mapping of the US. The outline maps and fills based on column values was easy. Labelling the states is what has taken most of the time. In my case, there are five states (MI, PI, MD, FL, and LA) that are asymmetrical enough that the label has to be adjusted manually, and I'm stuck at doing this on a filtered subset and getting it back into the main tibble. I'll probably end up with two pairs of label plotting when I figure out how to make the oddballs invisible in the main tibble. Then I can overplot to add the others. Fortunately, I know how to switch colors from black to white when the fill is dark.
My ambition is to post this at https://technocrat.rbind.io next week with a link to the github Rmd code to provide a lamp in the darkness. My hope is that others dealing with geobased data will be able to follow along and help me identify the unnecessary kludges.
If you want to plunge in
library(sf)
library(tidyverse)
# if you have Census API key in your .Renviron
states_pop <- get_estimates("state", product = "population", geometry = TRUE, shift_geo = TRUE) %>% filter(variable == "POP") %>% filter(GEOID != 11) %>% mutate(geoid = as.integer(GEOID)) %>% select(GEOID, NAME, geoid, value) %>% mutate(POP_TOT = value) %>% select(-value)
# This produces a population estimate of the 50 states plus DC along with the shapefiles
#Or you can get any shapefile
your_object <- st_read(dsn, layer)
converter <- read.csv("https://gist.githubusercontent.com/technocrat/93470bf9abead06ef926/raw/f652f8171374e7808455f42167f5480ea15f7f4e/state_fips_postal.csv", header = FALSE, stringsAsFactors = FALSE)
converter <- rename(converter, NAME = V1, geoid = V2, id = V3)
states_key <- as.tibble(converter) %>% filter(id != 'DC')
states_pop <- get_estimates("state", product = "population", geometry = TRUE, shift_geo = TRUE) %>% filter(variable == "POP") %>% filter(GEOID != 11) %>% mutate(geoid = as.integer(GEOID)) %>% select(GEOID, NAME, geoid, value) %>% mutate(POP_TOT = value) %>% select(-value)
states <- inner_join(states_pop, states_key, by = "NAME")
states_centroid <- as.data.frame(states_centroid)
states <- inner_join(states, as.tibble(states_centroid), by = "GEOID")
states_pop <- inner_join(states_pop, states_key, by = "NAME")
states_centroid_matrix <- st_centroid(states_pop$geometry)
states_centroid <- as.tibble(state_centroid_matrix) %>% transmute(clong = X, clat = Y)
geoid <- as.tibble(states_pop$geoid)
colnames(geoid) <- "geoid"
states_centroid <- as.tibble(cbind(geoid, states_centroid))
states_with_centroids <- inner_join(states_pop, states_centroid, by = "geoid")
states <- inner_join(states_with_centroids, converter, by = "geoid")
states <- states %>% mutate(NAME = NAME.x, id_clong = clong, id_clat = clat) %>% select(-NAME.x, -NAME.y, -geoid, GEOID, NAME, id, POP_TOT, id_clong, id_clat, clong, clat)
basemap <- ggplot(states) + geom_sf(color = "dark grey", size = 0.3, fill="white") + no_ylab + no_xlab + plain_theme
basemap + geom_text(data = states, aes(x=id_clong,y=id_clat, label = id), size = 2.5)
This has gotten me to the point that I can plot, separately, name overlays for the 35 states that require no adjustment and the 5 that do.
If all you care about is the location of the centroids, your troubles are over with the st_coordinates transformation from a matrix object to a tibble, which isn't hard.
Sorry for the long ramble. This is actually much, much easier to do than in the sp package and fortify, and it will pay big dividends in your future work if you have a couple of weekends to spare getting your head around sf and sf_geom.