isaid
February 2, 2023, 9:49am
1
Hi, I'm trying to get data from this website:
UPMC Hospitals
I only want the names of each Hospitals in the Southwest Pa. (12 in total)
The SelectorGadget
identiffy this node #accordion_920f1b98-2bd1-4afa-b9b2-4b1838a4ee91 .panel-title
I'm using this code
link <- 'https://www.upmc.com/locations/hospitals'
pg <- read_html(url(link))
node <- html_nodes(pg, "#accordion_920f1b98-2bd1-4afa-b9b2-4b1838a4ee91 .panel-title")
html_text(node)
And it return {xml_nodeset (0)}
I'm not sure why the SelectorGadget value is not working. Below is another (longer) route to get to the desired data.
library(rvest)
library(tidyverse)
my_region = "Southwest Pa."
link <- 'https://www.upmc.com/locations/hospitals'
pg <- read_html(url(link))
# scrape regions first
regions = html_nodes(pg, 'h2') %>% html_text()
# scrape regions and hospitals
node <- html_nodes(pg, "h2, .panel-title") %>% html_text()
# create data frame to filter to desired list
data.frame(node = node) %>%
mutate(region = ifelse(node %in% regions, node, NA)) %>%
fill(region) %>%
filter(region == my_region & region != node) %>%
pull(node)
#> [1] "UPMC Children's Hospital of Pittsburgh: Pittsburgh, Pa. (Lawrenceville)"
#> [2] "UPMC East: Monroeville, Pa."
#> [3] "UPMC Magee-Womens Hospital: Pittsburgh, Pa. (Oakland)"
#> [4] "UPMC McKeesport: McKeesport, Pa."
#> [5] "UPMC Mercy: Pittsburgh, Pa. (Uptown)"
#> [6] "UPMC Montefiore: Pittsburgh, Pa. (Oakland)"
#> [7] "UPMC Passavant – Cranberry: Cranberry Township, Pa."
#> [8] "UPMC Passavant – McCandless: Pittsburgh, Pa. (McCandless Township)"
#> [9] "UPMC Presbyterian: Pittsburgh, Pa. (Oakland)"
#> [10] "UPMC Shadyside: Pittsburgh, Pa. (Shadyside)"
#> [11] "UPMC St. Margaret: Pittsburgh, Pa. (Aspinwall)"
#> [12] "UPMC Western Psychiatric Hospital: Pittsburgh, Pa. (Oakland)"
Created on 2023-02-02 with reprex v2.0.2.9000
1 Like
isaid
February 3, 2023, 2:50am
3
Can you please tell how did you get the .panel-title
in html_nodes
function?
Using the SelectorGadget, I clicked on "Southwest PA." and the first hospital underneath. The tool returned "h2, .panel-title".
1 Like
system
Closed
February 13, 2023, 2:42am
6
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.