Scrape info embedded in table buttons

ericpgreen · July 24, 2021, 3:48pm

I'm trying to use {rvest} to scrape a table with an embedded button in each cell of the last column. This button uses js to open more information about each record (row) in the main table.

It's pretty straightforward to scrape the main table with

read_html(url) %>%
  page %>% 
  html_table(fill = TRUE)

But this does not get the data for each record that becomes visible on click. For instance, the data for record 1 is defined in the html as:

data-id="1" data-var3 = "Yes" data-var4 = "No"

What's a good approach here? I don't have the html/css skills to create a full toy example, so I know I might be limited in the help I can get. I'm hoping someone might have a similar experience that could give me some ideas. I'd also be happy to get any suggestions for similar webpage examples that we could use as a toy example.

Here's a toy example of the html only (not the js button mapping). ID in the main table maps to data-id in the button pop-up.

<div>  
	<table id="main">
		<thead>
			<tr>
				<th>ID</th>
				<th>Var1</th>
				<th>Var2</th>
                <th>Details</th>
            </tr>
        </thead>
    	<tbody>
    		<tr>
    			<td>1</td>
    			<td>Something</td>
    			<td>Else</td>
    			<td>
    				<a href="#">
						<button type="button" data-toggle="embed" 
							data-target="#the_details"
					    	data-id="1" data-var3 = "Yes" data-var4 = "No" >
                        </button>
                    </a> 
                            
                </td>
            </tr>
            <tr>
            	<td>2</td>
    			<td>Another</td>
    			<td>One</td>
    			<td>
    				<a href="#">
						<button type="button" data-toggle="embed" data-target="#the_details"
					    	data-id="2" data-var3 = "No" data-var4 = "Yes" >
                        </button>
                    </a>
                </td>
            </tr>
        </tbody>
    </table>
</div>

ericpgreen · July 26, 2021, 3:37pm

What ultimately worked for me was to open developer tools in Chrome, find the last column of the table in the Elements view, and copy the Xpath.

xpath <- as.character('//*[@id="main"]/tbody/tr/td[4]')

scrape <- function(url){
    res <- read_html(url) %>% 
      html_nodes(xpath=xpath) %>%
      html_elements("a") %>%
      # more processing here as needed
}

system · August 2, 2021, 3:37pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.