rvest::html_session -- Load pages that "initiate" with a loading circle

Hello RStudio Community!

I have a simple web scraping script that would log into bill.com and download a few reports for me. However, it appears they have updated their login page. It use to load right away and I was able to extract the form. Now when I go to the log in it shows a loading bar THEN the website loads. My apologies for not knowing what this is called. I would love to learn what this is called! When I look at my session made by html_sessions() and construct the HTML form of it I am greeted with the spinning wheel that disappears after a few seconds, then nothing loads.

I believe that it is trying to load or call something. I want to know if anyone has a way to get the session to load past this? I just need to login, that's all! So any other methods for logging in so I can navigate the site are also good with me :grin:

RSelenium is not a possibility sadly, my work environment does not allow the dependencies needed to get it up and running.

library(rvest)

url <- "https://app.bill.com/neo/login"

(bill_session <- html_session(url))
#> <session> https://app.bill.com/neo/login
#>   Status: 200
#>   Type:   text/html
#>   Size:   35411

(bill_form <- html_form(bill_session))
#> list()

Created on 2020-05-27 by the reprex package (v0.3.0)

Thanks in advance to any resources and/or solutions to move past this hurdle!

Kyle

As always with scraping, I check if there is an API. it seems it is the case

Did you try it ?

To get access to data from a program, API is better for M2M exchange than scraping.

About headless browsing, there is solution now to use Chrome Devtools Protocol from R to do headless browsing and control your browser from R. You just need a browser that use the devtool protocol (chromium based browser)

See those packages :

They are new but they work and help around. It is rather low level too and you need to dig the documentaiton of the devtools protocol to know what to do in your browser.

Hope it helps

3 Likes

Hey @cderv! Thanks for the response! Yes, I do use their API for a few tasks such as extracting bills, however, there are quite a few things that bill.com provides through their UI but are not available through their API.

Thank you for bringing those packages to light! I will definitely look into those today!

Looks like this was also posted on Stack Overflow. Please take a look at the guidelines for cross-posting below.

From: FAQ: Is it OK if I cross-post?

Posting the same question to multiple forums at the same time is often considered impolite. We don't completely ban such cross-posting, but we ask you to think hard before you do it and to follow some rules.

:mantelpiece_clock: Cross-post sparingly
Rather than post the same thing here and elsewhere from the get-go, post in one place at a time. Let enough time go by (think days, not hours) before you take your question somewhere else. Sometimes people at another site may suggest you post here if your question doesn't fit within the scope of the other site.

:link: Always link to your other posts, and update everywhere with any solutions
No matter what your reason for cross-posting is, when you post here please be sure to link to your related post on the other site and keep both ends updated with any solution.

:recycle: Don't just dump a link to your post on another help site
If you posted elsewhere but didn't find a solution, please don't just drop a link to that original post here. A bare link post is missing the details that make it useful and discoverable to people with similar problems in the future. It is also unlikely to entice any potential helpers to click through.

1 Like

My apologies on the cross-post, I was wanting to reach out to as many people as possible. I should of thought about reading the guidelines on this topic. I have deleted the SO post so there is no conflict. Thanks @mfherman!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.