Scraping with empty results

I'm trying to scrap the "Clasificaciones" tab on this page. But with this code I only get an empty df. If I delete the last part of the path (/table), I get a list of one item but with an empty tibble as well.

url <- "http://www.hockeypatines.fep.es/league/2478"

class <- url %>%
        read_html() %>%
        html_nodes(xpath = '//*[@id="tab_modal_contenido_competicion"]/table') %>%
        html_table()

The website is dynamic, javascript is used to put it together in your webrowser, the table is not statically available at the url you pointed rvest too. rvest is suitable for static webpages but not dynamic ones, at least not directly.
I think you need to use something like Rselenium or Chromote to navigate to the page and scrape it there, or at least download it as html to a local file, and use rvest once the static content is served via the browser

Will try this. Thanks for suggestions @nirgrahamuk

Table content is fetched through an Ajax call which you can initiate yourself too. Though it's a POST call and should use the correct Origin value in the header, so it's more convenient to use httr/httr2 for this:

library(httr2)
library(rvest)

request("http://www.server2.sidgad.es/rfep/rfep_clasif_idc_2478_1.php") |>
  req_headers( Origin = "http://www.hockeypatines.fep.es") |>
  req_body_form(idc = 2478, site_lang = "es") |>
  req_perform() |>
  resp_body_html() |>
  html_element("table.tabla_clasif") |>
  html_table()
#> # A tibble: 14 Ă— 12
#>       X1 X2    X3             X4    X5    X6    X7    X8    X9   X10   X11 X12  
#>    <int> <lgl> <chr>       <int> <int> <int> <int> <int> <int> <int> <int> <lgl>
#>  1     1 NA    CP VILA-SA…    21     8     7     0     1    32     9    23 NA   
#>  2     2 NA    GENERALI H…    19     8     6     1     1    38     4    34 NA   
#>  3     3 NA    TELECABLE …    19     8     6     1     1    25     6    19 NA   
#>  4     4 NA    HC CORUÑAH…    17     8     5     2     1    25    14    11 NA   
#>  5     5 NA    CP ESNECA …    15     8     4     3     1    22    10    12 NA   
#>  6     6 NA    LIDERGRIP …    15     8     5     0     3    16    15     1 NA   
#>  7     7 NA    MARTINELIA…    14     8     4     2     2    30    21     9 NA   
#>  8     8 NA    CERDANYOLA…     9     8     3     0     5    23    37   -14 NA   
#>  9     9 NA    SOLIDEO PH…     8     8     2     2     4    18    17     1 NA   
#> 10    10 NA    CP LAS ROZ…     7     8     2     1     5    17    34   -17 NA   
#> 11    11 NA    CP VOLTREG…     6     8     1     3     4    13    25   -12 NA   
#> 12    12 NA    BEMBIBRE H…     5     8     1     2     5    13    26   -13 NA   
#> 13    13 NA    IGUALADA F…     4     8     1     1     6     9    30   -21 NA   
#> 14    14 NA    CP ALCOBEN…     0     8     0     0     8    10    43   -33 NA

Created on 2023-11-15 with reprex v2.0.2

2 Likes

Perfect solution. Only a question @margusl , Âżhow you find Ajax call?

Thx for solution

When you open the site in your browser, activate developer tools. If you now change to different tab (i.e. to "Clasificaciones"), you'll see relevant requests in network tab of dev. tools. When you check request details, you'll also get request headers and POST payload.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.