How to become more accomplished at rvest?

Does anyone know of any resources to help me become more proficient in using rvest?

I've used rvest on and off ever since it was first released.
But I don't have a clue what I am doing, and always feel like whenever I'm able to get what I want from a webpage it's by as much by luck as by skill.

I usually just want to grab a table from wikipedia or all of the links from a page. Simple stuff.
But I don't really know where to start, I fiddle around with the SelectorGadget add-in to Chrome, and I have done things enough time to eventually get what I want.

And when I do get what I want, I get frustrated with the data structures - what methods are useful here?

Assume I know nothing except tidyverse. Do I need to learn some html? Do I need to know what an xpath is? The rvest documentation is useful but I feel like I'm missing step 0 where I learn what everything is and how best to apply it.

Thanks

1 Like

@jspncr,
i would say, you need to know how to read, html, css etc. You will have to be comfortable using Web Developer tools to navigate and understand the site in question, like so:
DeepinScreenshot_select-area_20190617123942
Each website is different, once you understand the structure, you can target that tag.
https://developers.whatismybrowser.com/useragents/explore/software_name/firefox/
In this case I would pull down down data inside div class="corset" because table I need is in there.

I tried rvest, but I ended up using Python, way easier and is designed for networking. That is my opinion and preference because I would have loved to use R ecosystem, but now I use R, Python most of the time together, thanks to reticulate

2 Likes

Thanks @Kill3rbee. Can you recommend any resources that I can use to help me understand some web structures?

That is a tough one. What is your objective with learning html? Web Scraping Or Building Shiny apps?
There are so many books and internet resources. I just say start doing it. Get familiar with web developer tools. As you do it more, you will learn some websites will block your IP. I had to learn throttling, making my scrapers look human like to avoid being flagged as a robot.
Pick your site of interest, open web developer and navigate the site. Once you know what to look for code and try to pull data down.
I know javascript, but trying to integrate it into Shiny, can be a challenge. I am have been learning jQuery. I honestly do not know what to recommend. -> https://www.youtube.com/watch?v=pbvK2t6fFGA

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.