Is it possible to scrape the polygon data from this interactive map in R?
https://fogocruzado.org.br/mapadosgruposarmados
The map shows territories controlled by armed groups across different years. I'd like to automate the extraction of this data for all years and save each as multiple shapefiles.
The API they have doesn't contain the spactial data yet.
I am being able to extract the GeoJSON but it isn't working correctly.
# Load the HTML file
html_file = read_html(html_path)
# Extract path data from the HTML
path_data = html_file %>%
html_nodes("path.leaflet-interactive") %>%
html_attr("d")
# Function to clean and extract coordinates from path data
clean_coordinates <- function(path) {
# Remove "M", "L", "z" commands (move, line, close)
path <- gsub("[MLz]", "", path) # Remove M, L, and z commands
path <- gsub("[^0-9,.-]", " ", path) # Remove any non-numeric characters except comma and period
coords <- strsplit(path, " ") # Split by spaces to separate coordinates
coords <- as.numeric(unlist(coords)) # Flatten list and convert to numeric
# Ensure coordinates are in pairs (lat, lon)
coords_matrix <- matrix(coords, ncol = 2, byrow = TRUE)
# Ensure the coordinates are in [lon, lat] order for GeoJSON
coords_matrix <- coords_matrix[, c(2, 1)] # Switch lat, lon order
# Optionally, adjust the coordinates if they are too far off (e.g., by shifting or scaling)
# coords_matrix <- coords_matrix - min(coords_matrix, na.rm = TRUE) # Adjust if needed
return(coords_matrix)
}
# Apply the function to clean the coordinates
coordinates_list <- lapply(path_data, clean_coordinates)
# Create GeoJSON features
geojson <- list(
type = "FeatureCollection",
features = lapply(coordinates_list, function(coord) {
list(
type = "Feature",
geometry = list(
type = "Polygon", # or "LineString" based on the data
coordinates = list(coord) # Directly use the coordinates list
)
)
})
)
# Convert the result to JSON
geojson_json <- toJSON(geojson, pretty = TRUE)
# Write the cleaned GeoJSON to a .geojson file
write(geojson_json, "output/output.geojson")
The output.geojson isn't right and I can't open it.