trigger click on download button with httr?

zoowalk · August 19, 2019, 8:19pm

Hello,

I am trying to setup a script which automatically downloads every day Facebook's daily reports on political advertisments. The button to download the report is at the bottom of this page.

I am able to trigger the download with RSelenium:

library(RSelenium)

#open server
rd <- rsDriver(browser = "firefox", port = 4444L)

#open browser
ffd <- rd$client

#navigate to target url
url <- 'https://www.facebook.com/ads/library/report/'
ffd$navigate(url)

css_selector <- "._7vio"
download_btn <- ffd$findElement(using = "css selector", css_selector)
download_btn$clickElement()

And I am also able to setup an RStudio Server instance which runs on AWS. However, I didn't manage to get RSelenium running on the virtual cloud.

Hence, I was wondering whether there is another way to trigger the 'click' on the download button, e.g. via httr. I saw a few pertaining posts on stackoverflow, but they pertain to forms which are downloaded as .csvs. Here I have to trigger the 'click' and I haven't figured out how to do this. In fact, I am even not sure whether it is possible. I would be hence grateful for any hint/cue how to proceed.

I had posted this question already on stackoverflow, but didn't get any reply. Should an answer come up I will post it here. Many thanks.

ConnorKirk · August 21, 2019, 8:48am

Hi Zoowalk,

I had a quick look at the page you linked. Facebook have made it difficult to download the report programmatically through the page you provided. Instead of a standard HTML button, or a link to a resource, the page appears to use Javascript to initiate the download. You may be able to dig deeper into the Javascript to see where the file originates from.

However, it appears the Facebook provide an API for programatically querying the Ad Library.. This would be a far more reliable way to access the data you desire.

zoowalk · August 21, 2019, 9:36am

Hi Connor,

many thanks for looking into it. I was already afraid that this is the case. I guess I underestimated the challenge to programmatically trigger the download via a cloud instance.

I am aware of the facebook ads API and use it. Remarkably though, the information in the reports and provided by the API are not identical and I wanted to have a closer look at it.

Many thanks again!

ConnorKirk · August 21, 2019, 9:46am

Sorry I couldn't be of further help. What was your problem with RSelenium on AWS?

zoowalk · August 21, 2019, 9:51am

I am a complete beginner with AWS and docker and it's a bit overwhelming. I found this blog post which, I think, deals with the issues I am struggling with. Will keep trying and post any solution. Thanks again!

jdb · August 21, 2019, 3:55pm

Using the Dev Console on Mozilla, clicking the link takes you to this link: https://scontent-atl3-1.xx.fbcdn.net/v/t39.22812-6/69568581_430660710992817_4957430065416634368_n.zip/FacebookAdLibraryReport_2019-08-18_US_yesterday.zip?_nc_cat=105&_nc_oc=AQmcsB-FqcewZOzOXAzf2bJPD7Z5ACT6H0fbbof9Pt90vpL75JfswzL7YNNLdH98BsE&_nc_ht=scontent-atl3-1.xx&oh=01d5cf3aa0f37764b30deab53b7aabb2&oe=5DCACF68

Maybe you could construct the link needed for a GET request using httr. You'd need the latest date within this section of the URL: FacebookAdLibraryReport_2019-08-18_US_yesterday.zip, but you could probably get that from the HTML using rvest and just convert it to ISO format.

system · September 11, 2019, 3:55pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.