Hey,
I'm very new to scraping data with R and this porblem seems to be very tricky. Here is it:
I'd like to scrap player data from this german football manager website (example profile: Kahn - Comunio Statistiken)
The specific data I'd like to collect is marked yellow in this screenshot:
The problem is: The quantity of data concerning "Saison XX" and "Pkt. xx" differs from each profile. Depending on which and how many seasons a player plays in the Bundesliga. So in one profile you may have only one season-and-point-data set in other profiles you have a lot of data sets like this one; Gnabry - Comunio Statistiken.
Ideally I would like to get data set or frame looking like this (first example):
Name Position Season Points
Kahn GK 2007/08 76
Kahn GK 2006/07 106
and then next profile.
So in the loop: the name and position have to be constant as long as there are more seasons and points to collect. Then the next palyer profile should be collected.
I've tried multiple things: First I try to work with the html text function to wirte from the specific nodes but since the quantity of notes is different from each profile I was only able to get the first position (in this example Season 2007/08) of every player profile.
library(dplyr)
library(rvest)
playerinf=data.frame()
for(page_result in seq (from = 1, to = 1000, by = 1)){
link = paste0("https://stats.comunio.de/profile?id=",page_result)
code = read_html(link)
Name = code %>% html_nodes("#content .bold")%>% html_text()
Season = code %>% html_nodes(".nopadding:nth-child(1) tr:nth-child(2) td:nth-child(1)")%>% html_text()
Position = code %>% html_nodes("td:nth-child(1) tr:nth-child(3) .left+ td")%>% html_text()
Points = code %>% html_nodes(".nopadding:nth-child(1) tr:nth-child(2) td+ td")%>% html_text()
playerinf=rbind(playerinf,data.frame(
Name = ifelse(length(Name)==0,NA,Name),
Season= ifelse(length(Season)==0,NA,Season),
Position= ifelse(length(Position)==0,NA,Position),
Points= ifelse(length(Points)==0,NA,Points)))
write.csv(playerinf, "PlayerInfomartionComStat.csv")
}
My second idea was to scrap the table including Seasons and Points (which node is always described the same way in every player profile). I got these information but I fail to then combine them with the name and position to get it in the desired form (name and position in every new row with actual season and points).
How can I scrap the data in the desired form. If you have any idea please let me know.
Thank in advance!