html_nodes fail to get data, return xml_nodeset (0)

Hi, I'm trying to get data from this website:
UPMC Hospitals

I only want the names of each Hospitals in the Southwest Pa. (12 in total)
The SelectorGadget identiffy this node #accordion_920f1b98-2bd1-4afa-b9b2-4b1838a4ee91 .panel-title

I'm using this code

link <- 'https://www.upmc.com/locations/hospitals'
pg <- read_html(url(link))
node <- html_nodes(pg, "#accordion_920f1b98-2bd1-4afa-b9b2-4b1838a4ee91 .panel-title")
html_text(node)

And it return {xml_nodeset (0)}

I'm not sure why the SelectorGadget value is not working. Below is another (longer) route to get to the desired data.

library(rvest)
library(tidyverse)

my_region = "Southwest Pa."
link <- 'https://www.upmc.com/locations/hospitals'
pg <- read_html(url(link))

# scrape regions first
regions = html_nodes(pg, 'h2') %>% html_text()

# scrape regions and hospitals
node <- html_nodes(pg, "h2, .panel-title") %>% html_text()

# create data frame to filter to desired list
data.frame(node = node) %>%
  mutate(region = ifelse(node %in% regions, node, NA)) %>%
  fill(region) %>%
  filter(region == my_region & region != node) %>%
  pull(node)
#>  [1] "UPMC Children's Hospital of Pittsburgh: Pittsburgh, Pa. (Lawrenceville)"
#>  [2] "UPMC East: Monroeville, Pa."                                            
#>  [3] "UPMC Magee-Womens Hospital: Pittsburgh, Pa. (Oakland)"                  
#>  [4] "UPMC McKeesport: McKeesport, Pa."                                       
#>  [5] "UPMC Mercy: Pittsburgh, Pa. (Uptown)"                                   
#>  [6] "UPMC Montefiore: Pittsburgh, Pa. (Oakland)"                             
#>  [7] "UPMC Passavant – Cranberry: Cranberry Township, Pa."                    
#>  [8] "UPMC Passavant – McCandless: Pittsburgh, Pa. (McCandless Township)"     
#>  [9] "UPMC Presbyterian: Pittsburgh, Pa. (Oakland)"                           
#> [10] "UPMC Shadyside: Pittsburgh, Pa. (Shadyside)"                            
#> [11] "UPMC St. Margaret: Pittsburgh, Pa. (Aspinwall)"                         
#> [12] "UPMC Western Psychiatric Hospital: Pittsburgh, Pa. (Oakland)"

Created on 2023-02-02 with reprex v2.0.2.9000

1 Like

Can you please tell how did you get the .panel-title in html_nodes function?

Using the SelectorGadget, I clicked on "Southwest PA." and the first hospital underneath. The tool returned "h2, .panel-title".

1 Like

Aah I see, thank you!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.