Hello I am looking for resources/advice on making a a geospatial predictive model with these characteristics:
Observations are aggregated measures at the census tract level
The outcome is a count variable
If I am using the mlr3 package as demonstrated in Geocomputation with R, what are some statistical learning methods well suited to a count outcome?
Also I don't have a ton of experience with either ecosystem, but I know more about Tidymodels than mlr3. Can you do similar geospatial modelling in Tidymodels?
To my knowledge neither {mlr} nor {tidymodels} are particularly strong (or weak) at spatial data. If I were in your place I would stick with the approach you are more familiar with.
In my work I avoid both, and stick to the "raw" calculations, which in case of count data would be either a linear model via lm (always a good baseline, and hard to beat for explainability) or poisson via glm. But that is a personal preference, not a general recommendation.
The first step should be to determine if your measures are spatially correlated. Moran's I is a usual start in that; it lives in {spatialreg} as moran.test(). The outcome you should be hoping for is to find your variables to not be spatially correlated - it is much, much less trouble that way.
If you find a spatial correlation that can not be disregarded you have two options: build a spatial regression (one that takes into account adjacency of the census blocks) or look hard at the plot of local moran and think about a possible new variable that could explain the spatially correlated errors away.
Both approaches have problems - building a spatial regression limits you in prediction to your original area, and a new variable may not be there to be had.
Great, thanks for all the advice! If right now, I just have the FIPS code for census tracts, what's the best way to encode the spatial information? Should I use the centroid of each census tract feature?