Hello everyone, Wish you have a nice day
I need help from all of you guys. So i decided to start learning to scrapping data using rvest instead do it manually and i found a problem. I want to scrape Club/Team name and Pts from https://www.transfermarkt.com/premier-league/tabelle/wettbewerb/GB1?saison_id=2021
The problem is :
team_elements <- html_nodes(webpage, ".hauptlink > a")
The code scrap not only team name, but also pic of information on man city, chelsea, leicester, brentford, watford, and norwich. So how to filter only name that appear after scrap
-
pts_elements <- html_nodes(webpage, ".zentriert")
That code scrap not only pts, but also w,d,l,goals. So how to filter only pts appear after scrap
Thank you guys for the help! And i attach the full code below.
# Define the URL of the website
url <- "https://www.transfermarkt.com/premier-league/tabelle/wettbewerb/GB1?saison_id=2021"
# Read the HTML content of the webpage
webpage <- read_html(url)
# Extract Club names
team_elements <- html_nodes(webpage, ".hauptlink > a")
team<- html_text(team_elements)
# Extract Points (Pts)
pts_elements <- html_nodes(webpage, ".zentriert")
pts <- as.numeric(html_text(pts_elements))
# Enter to Data Frame
premier_league_data <- data.frame(Team = team, Pts = pts)
# Print it
print(premier_league_data)