Data scraping a page result of a search engine

Hi everyone! I'm strugling to scrapp some data from this web page Its a page result from the search section of the page, where I've set some parameters.

To clarify, here is the data[highlighted] that I want to extract from the 305 results that I found.
To find the right css selector I used SelectorGadget

So I've started with these lines of code and I keep receving 0(zero) nodes as response.

library(httr)
library(xml2)
#> Warning: package 'xml2' was built under R version 3.5.2
library(rvest)
library(reprex)
#Data scrapping
site <- "https://jurisprudencia.trf4.jus.br/pesquisa/resultado_pesquisa.php"
ht <- read_html(site)
html_nodes(x = ht, css = ".tr_resultado_par:nth-child(5) .td_resultado , .td_resultado tr:nth-child(3) td , .td_resultado tr:nth-child(2) td , .tr_resultado_par:nth-child(6) .td_resultado , .tr_resultado_par:nth-child(4) .td_resultado , .td_resultado tr:nth-child(1) td")
#> {xml_nodeset (0)}

Created on 2019-01-16 by the reprex package (v0.2.0).

My question is: Am I using the right url to get the data? I mean, do I need to code my parameters using the search engine url, instead of just using the url with the page results? If yes, how do I do this? If not, what am I doing wrong?

Thanks for your patience, if I misspelled something or killed the english grammar, please forgive me.

Hi
i propose an other solution in order to have the data that you search for
you can use the code below


#Loading the rvest package
library('rvest')

#Specifying the url for desired website to be scraped
url <- "**put your url here...................;**"


#Reading the HTML code from the website
webpage <- read_html (url)

#Using CSS selectors to scrap the rankings section
wanted_data <- html_nodes(webpage,'**put the bloc of the data the you want..........**')

#Converting the ranking data to text
rank_data <- html_text(wanted_data)

Here are some examples of what you can put as url ans as css selection
(put your url here.):('https://www.imdb.com/search/title?title_type=feature&release_date=2016-01-01,2016-12-31&view=simple&count=100&start=100')
(put the bloc of the data the you want):('.lister-item-header a')

Thank you

I think you need to get each element one by one and not with all selectors at the same time.

Hi!
Thanks for your time. I think I did that, I tried many css selectors, for example

".tr_resultado_par:nth-child(3) tr:nth-child(1) td"

returns the same as I posted before.

Has to be something else, there is another way to scrapp a page besides using this packages?

Hi! Thanks for replying.

I didnt get your ideia, isn't the same I was doing so far? Except for this

rank_data <- html_text(wanted_data)

wich returns character(0) on the console.

I think the problem might be the url as you pointed. There is another way to do the scrapping?

It's PHP, and you're interacting with a database. You need to send a POST request with the form data or you'll end up with an empty page. I've written a more thorough answer in the DataCamp Slack channel.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.