How to keep identical names within a loop when scraping data from a website and assigning to a dataframe?

NTrenn · February 12, 2019, 2:36pm

Hi everyone,

I am trying to scrape data from a homepage. Therefore I have to write a loop that scarpes data from different subpages. However, when I use html_nodes, my code fails at gathering all information that is actually on the website since the html_text() items that have to be followed are of the same name. Hence, I do not get all information.

My code looks as follows:

library(rvest)
library(xml2)
library(dplyr)

url_vw_up <- "https://www.adac.de/infotestrat/autodatenbank/autokatalog/modelle.aspx?baureihe=up!&limit=1000#Ergebnis"

# vw-up page, follow_link(i) is later used to follow the nodes for each vw up that is set here
vw_up <- read_html(url_vw_up) %>% html_nodes(".img-wrap+ td .block") %>% html_text() 

# create the desired format of dataframe 
Adac_raw <- data.frame(matrix(nrow = 9, ncol =,))
  
# loop for scraping information 
s_vw_up <- html_session(url_vw_up)

for (i in vw_up[1:194]){
  
  page_up <- s_vw_up %>% follow_link(i) %>% read_html()
 
#here, I have the issue that duplicated observations are overwritten such that i only reveal 73 out of 194 observations - how can I change it? 
  Adac_raw[[i]] <- page_up %>% html_nodes("strong+ .box-section tr:nth-child(7) td+ td , strong+ .box-section tr:nth-child(6) td+ td , strong+ .box-section tr:nth-child(4) td+ td , strong+ .box-section tr:nth-child(3) td+ td , strong+ .box-section tr:nth-child(2) td+ td , strong+ .box-section tr:nth-child(1) td+ td , strong+ .box-section tr:nth-child(10) td+ td , strong+ .box-section tr:nth-child(11) td+ td , strong+ .box-section tr:nth-child(15) td+ td") %>% html_text()
  Sys.sleep(2)

  }

My code should acurally reveal information about all 194 vehicles, however it does only reveal for 73 due to identical names. Within my loop, same names are overwritten when I want to assign information to "Adac_raw. How can I change it to keep the duplicates / same names?

system · March 5, 2019, 3:43pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.