I have a sf
object with 2 columns, named fclass and geometry. I am interested in the fclass column. The fclass has several unique values. I convert this sf
object to spatraster
, I resample it and I perform a linear regression where I use another spatraster
, called ntl, as my response (lm(ntl ~ road)
).
My goal is to find which unique values give the largest r-squared in the lm model using stepwise linear regression whith backward elimination. My initial try was:
library(pacman)
pacman::p_load(terra, sf, dplyr)
ntl <- rast("path/ntl.tif") # response
v <- st_read("path/road17.shp") # sf object
# baseline r2
vterra <- vect(v)
ref <- rast("path/pop.tif") # get ext and pixel size
r <- rast(v, res = res(ref), ext = ext(ref))
x <- rasterizeGeom(vterra, r, "length", "m")
x_res <- resample(x, ntl, "average")
s <- c(ntl, x_res)
names(s) <- c("ntl", "road")
m <- lm(ntl ~ road, s, na.action = na.exclude)
baseline <- sqrt(summary(m)$adj.r.squared)
orig_baseline <- sqrt(summary(m)$adj.r.squared)
classes <- unique(v$fclass)
inclasses <- unique(v$fclass)
i <- 1
while (i <= length(classes)) {
class <- classes[i]
print(paste0("current class ", class))
print(paste0("orig baseline: ", orig_baseline, " - baseline: ", baseline))
print(classes)
v_filtered <- v[v$fclass != class, ]
vterra <- vect(v_filtered)
r <- rast(v, res = res(ref), ext = ext(ref))
x <- rasterizeGeom(vterra, r, "length", "m")
x_res <- resample(x, ntl, "average")
s <- c(ntl, x_res)
names(s) <- c("ntl", "road")
m <- lm(ntl ~ road, s, na.action = na.exclude)
class_r2 <- sqrt(summary(m)$adj.r.squared)
if (class_r2 > baseline) {
classes <- classes[-i]
baseline <- class_r2
} else {
print("class_r2 is less than baseline")
print(paste0("class_r2: ", class_r2, " - baseline: ", baseline))
i <- i + 1
}
}
but it's not efficient (it takes ~ 5 minutes).
I have found the MASS
package which has the lmStepAIC
function but I am not sure how to select the unique values from the fclass column to use as predictiors.
head(v, 6)
Simple feature collection with 6 features and 1 field
Geometry type: MULTILINESTRING
Dimension: XY
Bounding box: xmin: 598675.9 ymin: 7111459 xmax: 609432.8 ymax: 7118729
Projected CRS: WGS 84 / UTM zone 35S
fclass geometry
1 secondary MULTILINESTRING ((598675.9 ...
2 secondary MULTILINESTRING ((600641.7 ...
3 residential MULTILINESTRING ((601734.8 ...
4 residential MULTILINESTRING ((601163.9 ...
5 residential MULTILINESTRING ((601104.2 ...
6 motorway_link MULTILINESTRING ((609432.8 ...
The unique values in the fclass column:
unique(v$fclass)
[1] "secondary" "residential" "motorway_link" "service" "primary" "unclassified" "motorway"
[8] "tertiary" "trunk" "primary_link" "footway" "track" "secondary_link" "unknown"
[15] "living_street" "pedestrian" "path" "bridleway" "steps" "trunk_link" "track_grade1"
[22] "track_grade3" "track_grade5" "cycleway" "track_grade4" "tertiary_lin
If necessary I can share the data from my GoogleDrive .