I new in R, i need to make a webscrapping but i don't know how.

guilherme.rezende · June 9, 2022, 4:36pm

I need to get the information from the website www.doctoralia.com.br but I don't know how this is possible.

I loaded the rvest and dplyr libraries and connected the page, but inside the html_nodes() I don't know which attribute I should insert.

I will leave below the link of the doctors of São Paulo.

Can you tell me if it is possible to scrapping this information?

nirgrahamuk · June 9, 2022, 4:46pm

I may have missed it but I don't think you've told us what information from the webpage you want ...

guilherme.rezende · June 10, 2022, 2:02pm

I need to extract all the data from the doctors registered on the site.

Name, Specialty, City, Telephone, Cellular, Address, among others.

I'm not an expert in R. And I've been trying to extract for days but without success.

Dissipation123 · June 10, 2022, 7:24pm

This article has a lot of information regarding this: Harvesting the web with rvest • rvest

Scraping all of this data might take a bit of effort to gather and then clean, but here is some example code to get the information of the first profile:

library(rvest)
library(dplyr)

URL_test = read_html('https://www.doctoralia.com.br/pesquisa?q=&loc=S%C3%A3o%20Paulo')

CSS_pull2 <-
html_node(URL_test,'.media-body') %>% 
  html_text()

gsub('\t',' ',gsub('\n', ' ', CSS_pull2))

The following will get you all the nodes for the .media-body element. Again the data is not clean.

CSS_pull = html_nodes(URL_test, '.media-body')
CSS_text = html_text(CSS_pull, trim = TRUE)

gsub('\t',' ',gsub('\n', ' ', CSS_text))

system · July 1, 2022, 7:24pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.