Scraping SSRS Hosted Site with Query

I'm attempting to query a SSRS hosted website containing publicly available oil and gas volumes. The website at minimum requires the input of a date filter before submitting a request, so I'm leaning on the rvest package to fill the query form and submit. The response I get doesn't include the table I'm looking for when I try to use httr or rvest. I don't know enough about how ajax works or asynchronous posts to theorize as to why the response does not include the table so any guidance would be most appreciated.

## Pennsylvania State Well Data
library(rvest)
#> Loading required package: xml2
library(httr)

# establish session 's'
s <- html_session(
    "http://cedatareporting.pa.gov/Reportserver/Pages/ReportViewer.aspx?/Public/DEP/OG/SSRS/Oil_Gas_Well_Production",
    user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36")
)

# get unfilled html form
query_form <- html_form(s)[[1]]

# apply date filter to August 2020
filled_form <- set_values(
    query_form, 
    `ReportViewerControl$ctl04$ctl03$txtValue` = "Aug 2020 (PRODUCTION: Unconventional wells)"
)

# submit (default is asynchronous)
resp <- submit_form(
    session = s,
    form = filled_form
)
#> Submitting with 'ReportViewerControl$ctl04$ctl00'

# parse the response from the submitted form
parsed_resp <- content(resp$response, "parsed")

# target table identifiable by 31 cols - not in there
html_nodes(parsed_resp, "table")
#> {xml_nodeset (9)}
#> [1] <table cellspacing="0" cellpadding="0" width="100%" height="100%"><tr hei ...
#> [2] <table height="100%" width="100%"><tr>\n<td><div class="spinnie">    <div ...
#> [3] <table cellpadding="0" cellspacing="0" id="ReportViewerControl_fixedTable ...
#> [4] <table cellpadding="0" cellspacing="0" width="100%" id="ParameterTable_Re ...
#> [5] <table id="ParametersGridReportViewerControl_ctl04">\n<tr isparameterrow= ...
#> [6] <table><tr>\n<td><input name="ReportViewerControl$ctl04$ctl03$txtValue" t ...
#> [7] <table><tr>\n<td><input name="ReportViewerControl$ctl04$ctl09$txtValue" t ...
#> [8] <table><tr>\n<td><input name="ReportViewerControl$ctl04$ctl11$txtValue" t ...
#> [9] <table cellpadding="0" cellspacing="0" style="background-color:window;">\ ...

# Bonus Credit - once the query is submitted, an array of download options should become available. Being able to download the CSV would save the headache of having to paginate
html_nodes(parsed_resp, "div.DisabledButton")
#> {xml_nodeset (0)}

Created on 2020-10-21 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.