Long-time casual R enthusiast, first-time poster (please be gentle).
I'm having problems with a dataframe pulled from a website using the rvest read_html function.
I am trying to write a script which scrapes a table a tennis player info from tennisabstract.com. I was previously doing this without issue using the following:
df_tennis <- read.delim("C:/Users/blair/OneDrive/Desktop/ATP MENS ELO 27.06.22")
I wanted to make things more efficient with read_html (rvest) so I wouldn't need to manually copy and paste the table from the website into a csv every time I run my script.
The following code is what I am using to scrape the table. I convert it to a dataframe to make it compatible with my existing code from the earlier script using read.delim(). I then use filter to pull row data for a specific player.
atp_elo <- read_html("http://tennisabstract.com/reports/atp_elo_ratings.html")
tennis <<- atp_elo %>%
html_element("#reportable") %>%
html_table()
#remove empty columns
df_tennis <<- as.data.frame(tennis[-c(5, 9, 13)])
player1_info <<- df_tennis %>%
filter(Player == "Novak Djokovic")
but this returns a dataframe of 0 obs. of 13 variables. If I filter for a specific rank, then I get the information I want but I need to be able to pull rows using a player's name. I was using the exact same method in my earlier code so i suspect the dataframe produced using read_html is formatted differently in some way.
For your reference, the earlier version of my script that works:
df_tennis <<- read.delim("C:/Users/blair/OneDrive/Desktop/ATP MENS ELO 27.06.22")
player1_info <<- df_tennis %>%
filter(Player == "Novak Djokovic")
Note that this returns a dataframe of 1 obs. of 16 variables (because I never had to remove the 3 empty columns when using read.delim). The overall length of the dataframes is also different because the above code uses an older version of tennisabstract data (I was having the same problem when this data was current).
I would appreciate any help on how I can fix this issue and to understand why it occurred.
Cheers!