web scraping html_attr

bernatmallen · October 28, 2021, 11:40am

Hi everyone!

It's been one year since I sarted webscraping and never found a problem like the following:

sensacine_web<- read_html("https://www.sensacine.com/")

sensacine_text<- sensacine_web %>%
html_node(".titlebar-title-lg .titlebar-link") %>%
html_attr("href")

sensacine_link <- sensacine_web %>%
html_node(".titlebar-title-lg .titlebar-link") %>%
html_text()

While html_text works fine in this website, html_attr is unable to extract the hyperlink. It seems to be a problem in this website, but it worked in other ones. Apparently it isn't protected against webscraping, so why does this happen?

system · November 18, 2021, 11:40am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.