I am trying to implement random forest (RF) regression using the ranger
package in R
, but I am getting this error: Error: Missing data in columns: pop
(pop is my independent variable) when running the ranger
function.
For reference, when using the randomForest
package, I can use the na.action = na.omit
function to exclude the NA values, but in ranger
I can't do this.
I have tried to to something like (among other things):
m <- ranger(ntl ~ .,
data = as.data.frame(na.omit(s)),
mtry = 1,
importance = impurity)
but without success.
How can I exclude the NA values when I run the ranger
function?
Here is the code:
library(terra)
library(ranger)
wd = "path/"
ntl = rast(paste0(wd, "ntl2.tif"))
rlist = list.files(path = wd,
pattern = "^pop\\d+\\.tif$",
all.files = T,
full.names = F)
for (i in rlist){
for (j in i) {
nameNum = gsub("\\D+","",j)
print(nameNum)
print(j)
pop = rast(paste0(wd, j))
s = c(ntl, pop)
names(s) = c("ntl", "pop")
m <- ranger(ntl ~ .,
data = as.data.frame(s),
mtry = 1,
importance = impurity)
p <- predict(s, m)
rsds <- s$ntl - p
writeRaster(rsds,
filename = paste("path/rf_resids",
nameNum,
".tif",
sep=""),overwrite = T)
}
}
The reason I want to use the ranger
package over randomForest
is that it is faster to execute.
Two sample rasters:
ntl = rast(ncols=109, nrows=80, nlyrs=1, xmin=-31400, xmax=12200, ymin=6012900, ymax=6044900, names=c('ntl'), crs='PROJCRS[\"World_Mollweide\",BASEGEOGCRS[\"WGS 84\",DATUM[\"World Geodetic System 1984\",ELLIPSOID[\"WGS 84\",6378137,298.257223563,LENGTHUNIT[\"metre\",1]],ID[\"EPSG\",6326]],PRIMEM[\"Greenwich\",0,ANGLEUNIT[\"Degree\",0.0174532925199433]]],CONVERSION[\"unnamed\",METHOD[\"Mollweide\"],PARAMETER[\"Longitude of natural origin\",0,ANGLEUNIT[\"Degree\",0.0174532925199433],ID[\"EPSG\",8802]],PARAMETER[\"False easting\",0,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8806]],PARAMETER[\"False northing\",0,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8807]]],CS[Cartesian,2],AXIS[\"(E)\",east,ORDER[1],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]],AXIS[\"(N)\",north,ORDER[2],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]]]')
pop010 = rast(ncols=109, nrows=80, nlyrs=1, xmin=-31400, xmax=12200, ymin=6012900, ymax=6044900, names=c('focal_sum'), crs='PROJCRS[\"World_Mollweide\",BASEGEOGCRS[\"WGS 84\",DATUM[\"World Geodetic System 1984\",ELLIPSOID[\"WGS 84\",6378137,298.257223563,LENGTHUNIT[\"metre\",1]],ID[\"EPSG\",6326]],PRIMEM[\"Greenwich\",0,ANGLEUNIT[\"Degree\",0.0174532925199433]]],CONVERSION[\"unnamed\",METHOD[\"Mollweide\"],PARAMETER[\"Longitude of natural origin\",0,ANGLEUNIT[\"Degree\",0.0174532925199433],ID[\"EPSG\",8802]],PARAMETER[\"False easting\",0,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8806]],PARAMETER[\"False northing\",0,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8807]]],CS[Cartesian,2],AXIS[\"(E)\",east,ORDER[1],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]],AXIS[\"(N)\",north,ORDER[2],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]]]')