Can anyone help me figure out how to use rvest to scrape the details of the events listed in this web page and return them in a data frame with one row per event?

After using the Inspector Gadget to find what I thought was the required file path, I tried the following to drill down to the individual events, but it's returning a null set.

library(tidyverse)
library(rvest)

marches <- read_html("https://map.womensmarch.com/?source=website")

events <- marches %>% html_nodes("event-list-item")

I also asked this question on Stack Overflow (see here), and the solution I got there took advantage of the fact that the event feed on that page is loaded via javascript from a json file, a link to which can be found by inspecting the page source. So:

library(jsonlite)

feed <- fromJSON("https://zen-hypatia-739ed6.netlify.app/feed")
dat <- feed$events

str(dat)

'data.frame':   313 obs. of  22 variables:
 $ id                      : int  78 404 260 224 286 108 187 265 326 334 ...
 $ public_description      : chr  "Meet up with signs for:\r\nVote  Biden, protect rights of people with disabilities,  protect Roe VS Wade, prote"| __truncated__ "The womxn of the Oceti Sakowin, the Seven Sacred Council Fires of the Great Sioux Nation are marching to the po"| __truncated__ "As part of Worcester County's regular Blue Honk and Wave sign holding event (every Friday until the election), "| __truncated__ "Standout for Social Justice \r\nWear Mask \r\nMaintain physical distance of at least 6 feet\r\nBring your signs"| __truncated__ ...
 $ campaign                : chr  "oct-17-march" "oct-17-march" "oct-17-march" "oct-17-march" ...
 $ lat                     : num  42.4 44.1 38.3 42.3 40.8 ...
 $ lng                     : num  -71.1 -103.2 -75.1 -71.4 -111.9 ...
 $ title                   : chr  "Get Up, Stand Up - Stand Up for Your Rights!" "Oceti Sakwin Womxn’s March 2020" "Honor RBG and Stand for Democracy" "Social Justice" ...
 $ event_doors_open_at     : logi  NA NA NA NA NA NA ...
 $ venue                   : chr  "Public island at a major 4 way stop. Intersection of North Harvard St and Western Ave Boston MA 02134" "Zoom webinar. https://aclu.zoom.us/j/5351676736 Rapid City SD 57701" "West Ocean City Park and Ride. 12940 Inlet Isle Lane Ocean City MD 21842" "Rt126 x Rt135. Rt126 x Rt135 Framingham MA 01702" ...
 $ hasCapacity             : int  1 1 1 1 1 1 1 1 1 1 ...
 $ city                    : chr  "Boston" "Rapid City" "Ocean City" "Framingham" ...
 $ state                   : chr  "MA" "SD" "MD" "MA" ...
 $ zip                     : chr  "02134" "57701" "21842" "01702" ...
 $ start_datetime          : chr  "2020-10-16 11:00:00.000000" "2020-10-16 10:00:00.000000" "2020-10-16 15:00:00.000000" "2020-10-16 17:00:00.000000" ...
 $ starts_at_utc           : chr  "2020-10-16 15:00:00.000000" "2020-10-16 16:00:00.000000" "2020-10-16 19:00:00.000000" "2020-10-16 21:00:00.000000" ...
 $ end_datetime            : logi  NA NA NA NA NA NA ...
 $ categories              : chr  "oct-17-march" "oct-17-march" "oct-17-march" "oct-17-march" ...
 $ event_is_virtual        : int  0 1 0 0 0 0 0 0 0 0 ...
 $ is_official             : int  0 0 0 0 0 0 0 0 0 0 ...
 $ is_team                 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ url                     : chr  "https://act.womensmarch.org/event/oct-17-march/78/" "https://act.womensmarch.org/event/oct-17-march/404/" "https://act.womensmarch.org/event/oct-17-march/260/" "https://act.womensmarch.org/event/oct-17-march/224/" ...
 $ start_datetime_formatted: chr  "Friday Oct 16 11:00 AM" "Friday Oct 16 10:00 AM" "Friday Oct 16 3:00 PM" "Friday Oct 16 5:00 PM" ...
 $ end_datetime_formatted  : logi  NA NA NA NA NA NA ...

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.