I've used R for both geospatial and time series data separately I wanted to know if there was a standard way of dealing with the temporal geospatial data when the shapefiles/geometries are constant over time.
My current understanding is that generally if one transform their data into a tidy format, then using one can really leverage the power of the tidyverse (ggplot, tidymodels, etc). However, transforming geospatial temporal data into a tidy style is really expensive memory wise as at every time index there is a new copy of the geometry even if the geometry has not changed between time indices.
Is there a way to leverage a constant geometry to so that the operations are closer to being additive instead of multiplicative but still use tidyverse?
More concretely and with last week's TidyTuesday data, processing the data in a tidy way creates very large object (dt_merged 135Gb) from left joins of somewhat large time series (df_raw 115 Mb) and geospatial components (counties 118 Mb). Is there an already built/established way of processing the data that leverages that the shapefiles don't change (or change very little)?
library(tidytuesdayR)
library(tigris)
library(data.table)
library(purrr)
counties <- as.data.table(counties(year = 2000))[, .(
code = paste0(STATEFP, COUNTYFP),
geometry, INTPTayLAT00, INTPTLON00)
]
setkey(counties, code)
fips <- as.data.table(fips_codes)[, code := paste0(state_code, county_code)]
setkey(fips, code)
df_raw <- as.data.table(tt_load('2022-06-14')$`drought-fips`)
setnames(df_raw, 'FIPS', 'code')
setkey(df_raw, code)
merge_order <- list(df_raw, fips, counties)
dt_merged <- reduce(merge_order, ~merge(.x, .y, all.x = TRUE, by='code'))
setkey(dt_merged, code)
format(object.size(counties), units = "auto") # "118.7 Mb"
format(object.size(fips), units = "auto") # "492.7 Kb"
format(object.size(df_raw), units = "auto") # "115.3 Mb"
format(object.size(dt_merged), units = "auto") # "135.1 Gb"