I am working on cleaning occurrence data and I want to check for duplicates of occurrence records within raster grid cells. First, I have multiple species files (>100), so I read each file separately in a dataframe list and I now have a single dataframe with a list of 100 files.
Next step is to have a raster layer which I will use to create new values to pixels., assigning each with a unique value (ID). The goal is to later extract these IDs for occurrence data.
If I read in a single species file and run my code. It is perfect. I want to be able to run the dataframe containing the list of 100 files at once. But I'm not sure of the correct way to do this.
- The code I used for running a single species file is;
DF= read.csv(".../NewFolder/Themeda_triandra.csv", sep = ",", header = TRUE)
rastID = raster(".../Rasters/30s_bio/wc2.1_30s_bio_15.tif")
rastID[] = 1:ncell(rastID)
occRastID= raster::extract(rastID, cbind(DF$decimalLongitude, DF$decimalLatitude))
summary(duplicated(occRastID))
cleanOccNoDup = DF[!duplicated(occRastID),]
- In attempting to do this for the entire list of files in my dataframe, I tried;
spDat = list()
sppFiles = list.files(".../NewFolder/")
sppNames = unlist(strsplit(sppFiles, split='.csv'))
summaryDatL = list(sppNames)
for(s in 1:length(sppNames)){
spDat[[s]] = read.csv(paste0(".../NewFolder/",sppNames[s],".csv"), sep=",",header=TRUE)
}
rastID = raster(".../Rasters/30s_bio/wc2.1_30s_bio_15.tif")
rastID[] = 1:ncell(rastID)
occRastID= raster::extract(rastID, cbind(spDat, "[", c("decimalLongitude","decimalLatitude")))
summary(duplicated(occRastID))
cleanOccNoDup = DF[!duplicated(occRastID),]
In the second attempt, I know I am doing something wrong with the brackets, just not sure how to solve this. Please Help!
Also, once the occurrence records have been cleaned, I want to keep these files as a list in a dataframe too.