Unable to identify the selector when scrapping Google News search results

ogarciaM998 · July 19, 2023, 9:45pm

Hello everyone. I am new to data scrapping, trying to figure out how to scrape the results of a google search is the news section. My example website is on Silvio Berlusconi. I've been using Chrome developer tools and SelectorGadget as assistance tools.

I have tried this approach:

First, using the selector for the headline as in the following image:

Then, applying the code as follows:

rm(list=ls())
set.seed(1234)
options(scipen=10000)
select <- dplyr::select

# loading the packages:
library(dplyr) # for pipes and the data_frame function
library(rvest) # webscraping
# extracting the whole website
google <- read_html("https://www.google.com/search?q=silvio+berlusconi&rlz=1C1ONGR_enDE1067DE1067&biw=1366&bih=652&tbm=nws&ei=Mi-4ZNGqN8iE9u8P_dqW-AQ&ved=0ahUKEwjRi8rxtpuAAxVIgv0HHX2tBU8Q4dUDCA0&oq=silvio+berlusconi&gs_lp=Egxnd3Mtd2l6LW5ld3MiEXNpbHZpbyBiZXJsdXNjb25pMgoQABiKBRixAxhDMgcQABiKBRhDMgcQABiKBRhDMgUQABiABDIFEAAYgAQyBRAAGIAEMgUQABiABDIFEAAYgAQyBRAAGIAEMgUQABiABEj3BlAAWABwAHgAkAEAmAFFoAFFqgEBMbgBDMgBAIgGAQ&sclient=gws-wiz-news")

# extracting the headlines
headlines <- google %>% 
  html_nodes("div.n0jPhd.ynAwRc.MBeuO.nDgy9d") %>%
  html_text()

headlines

character(0)

So my question is, why is the output character 0? And how can I fix it?
I have tried with alternative combinations, such as only using each of the div names, or

# extracting the headlines
headlines <- google %>% 
  html_nodes("div.n0jPhd") %>% 
  html_nodes("div.ynAwRc") %>% 
  html_nodes("div.MBeuO") %>% 
  html_nodes("div.nDgy9d") %>%
  html_text()

View(headlines)

, but the output has remained the same.
What am I missing?
Thank you in advance for any guidance, it will be very much appreciated. I

system · August 9, 2023, 9:46pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.