Hi Guys, I'm working with classifiers that in their predictions I get the probability of a certain disease occurring in a certain place. The classifier used is Random Forest and I get the following probabilities in the following format data frame
Yes No
0.98 0.02
0.2 0.98
0.80 0.10
0.50 0.50
Yes = have the disease; No = does not have the disease
Based on these probabilities I want to apply these results to each designated location on the map. Each probability is associated with a number designates the name of the place, for example
neighborhood 1 = 1
neighborhood 2 = 2
and so on.
In this case, for each neighborhood a probability is associated and I would like to count it on the map. I have shapefile file from location. Any ideas on how I can do this? I'm working on R Markdown and I'm new to the language. Any help is welcome and thanks in advance!
The sf package can read shapefiles and provides an object that works like a data frame to which other variables can be added. In your case, that would be a variable for neighborhood name, one for the estimated probability and, potentially, a categorical variable, such as yes/no, low/high, etc. ggplot can produce a map, called a thematic or chloropleth map, to illustrate the data.
See 5. Plotting Simple Features in the sf vignette for examples. My somewhat outdated post may help with some of the ggplot details.
The two problems that you are most likely to face are
Installing the external library dependencies for sf
Understanding how ggplot treats continuous and discrete scales differently