How select the correct xpath for download table with web scraping?

M_AcostaCH · April 14, 2023, 5:39am

Hi Community

Im want to scrape this page but the final df is empty. Maybe the xpath is wrong.

library(rvest)
library(tidyverse)

url <- "https://av.cenargen.embrapa.br/avconsulta/Passaporte/detalhes.do?ida=95802"
ALELO <- url %>%
  read_html() %>% 
  html_nodes(xpath = "/html/body/div[3]/table/tbody/tr/td[2]/div[2]/div[1]/div[2]/table") %>% 
  html_table() |> 
  data.frame()

# I'm try with other xpath = "body/div/table/tbody/tr/td/div/div/div/table" but don't run

The idea is get all this table in a df.

Tnks!

mara · April 14, 2023, 12:08pm

I can't get the URL to load. Is it possible that the initial connection isn't working. Do you get the return you expect when you just send the url to read_html()?

Again, I can't see the page, so I'm not sure how it works, but if content is loaded dynamically, you might need to bring RSelenium into the mix. The post below has some good content, as well as links to a bunch of other resources:

system · May 5, 2023, 12:09pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.