rama27
December 3, 2019, 6:38am
1
Hi,
I have a R script that scrapes a HTML table from a web page. My script perfectly works.
Now I would like to run my script on more web pages. I have downloaded HTML codes of URLs in a following table:
HTML code 1 | URL 1
HTML code 2 | URL 2
HTML code 3 | URL 3
Now I would like to run my R script on HTML code 1, 2, 3 ...
Could somebody help me, how to do this? Some R loop might help, but I dont know, how to do that. Thank you very much for your help!
To help us help you, could you please prepare a repr oducible ex ample (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:
A minimal reproducible example consists of the following items:
A minimal dataset, necessary to reproduce the issue
The minimal runnable code necessary to reproduce the issue, which can be run
on the given dataset, and including the necessary information on the used packages.
Let's quickly go over each one of these with examples:
Minimal Dataset (Sample Data)
You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue.
Let's say, as an example, that you are working with the iris data frame
head(iris)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.…
rama27
December 3, 2019, 4:28pm
4
Hi,
I have a following problem. I have a Rscript that do web scraping of HTML table from a webpage. The script is here:
url <- "webpage.com/task1"
table <- url %>%
html() %>%
html_nodes(xpath='//*[@id="Form1"]/table[4]') %>%
html_table(fill = TRUE)
#save to dataframe
tableDF <- data.frame(matrix(unlist(table), ncol =lengths(table)) )
But I need to scrape this HTML from different URLs of this webpage. I have downloaded all html codes in a following table (csv format):
HTML URL
html_code_task1 webpage.com/task1
html_code_task2 webpage.com/task2
html_code_task3 webpage.com/task3
. .
. .
. .
My script works, when I manualy run it for example on webpage.com/task10 .
But I would like to make R to go through all html_code (from task1 to task50) that I have in my table, save the values and finaly merge them together. Is it possible? Using a for loop, maybe?
Thank you very much for your help!
I can't test this code since you haven't provided actual sample data, but to give you a pointer, you could do something like this:
library(tidyverse)
scrape <- function(url) {
url %>%
html() %>%
html_nodes(xpath='//*[@id="Form1"]/table[4]') %>%
html_table(fill = TRUE) %>%
unlist() %>%
matrix(ncol = lengths(.)) %>%
data.frame( )
}
tableDF <- map_dfr(url_dataframe$url, scrape)
system
Closed
December 10, 2019, 8:38pm
6
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.