I have this code:
library(rvest)
url <- "http://www.example.com/ranking/unit?pagina=1"
url_html <- read_html(url)
whole_table <- url_html %>%
html_nodes('table') %>%
html_table(fill = TRUE) %>%
.[[1]]
url2 <- "http://www.example.com/ranking/unit?pagina=2"
url_html2 <- read_html(url2)
whole_table2 <- url_html2 %>%
html_nodes('table') %>%
html_table(fill = TRUE) %>%
.[[1]]
I run this code up to a dozen times, changing the URL consecutively, and finally join the different whole_table
into a single dataframe. I wonder if there's a more elegant solution without having to repeat these six lines of code twelve times with different numbering.
Try this on a site that actually has pages in this form
library(rvest)
paginas <- 1:2
baseurl <- "http://www.example.com/ranking/unit?pagina="
mk_url <- function(x) paste0(baseurl,x)
get_url <- function(x) read_html(mk_url(x))
sapply(paginas,get_url)
#> Error in open.connection(x, "rb"): HTTP error 404.
Created on 2023-03-14 with reprex v2.0.2
Thanks technocrat, but I only get this:
sapply(paginas,get_url)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
node ? ? ? ? ? ? ? ? ? ? ? ? ? ?
doc ? ? ? ? ? ? ? ? ? ? ? ? ? ?
I added more pages 1:14, that is why there are fourteen numbers
system
Closed
April 26, 2023, 8:47am
4
This topic was automatically closed 42 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.