web scrapping data from webpage and save in data frame

#install.packages("reprex")
library(reprex)
#install.packages("rvest")
library(rvest)
#install.packages("dplyr")
library(dplyr)

google <-read_html("https://bidplus.gem.gov.in/bidresultlists")
x<-google %>%
  html_nodes(".block")%>%
  html_text()
x
class(x)
x1<-gsub( "                                                            ", "", x) 
x1
x2<-gsub("                                                        ","",x1)
x2
x3<-gsub("            ","",x2)
x3
x4<-gsub("    ","",x3)
x4
x5<-gsub("  ","",x4)
x5
x6<-gsub("\n","",x5)
x6

#install.packages("qdapRegex")
library(qdapRegex)

final_df<-data.frame("BIDNO","Status","Quantity_Required","Department_Name_And_Address","Start_Date")

for(i in x6)
{
  BIDNO<-rm_between(x6[i], " BID NO: ", "Status", extract=TRUE)
  Status<-rm_between(x6[i], " Status: ", "Quantity Required", extract=TRUE)
  Quantity_Required<-rm_between(x6[i], " Quantity Required: ", "Department Name And Address", extract=TRUE)
  Department_Name_And_Address<-rm_between(x6[i], "Department Name And Address: ", "Start Date", extract=TRUE)
  Start_Date<-rm_between(x6[i], "Start Date: ", "End Date", extract=TRUE)
  #End_Date<-rm_between(x6[i], "End Date: ", "Technical Evaluation", extract=TRUE)
  
  #df<-data.frame(BIDNO,Status,Quantity_Required,Department_Name_And_Address,Start_Date)
  
  df<-DataFrame("BIDNO","Status","Quantity_Required","Department_Name_And_Address","Start_Date")

  #df<-data.frame("BIDNO","Status","Quantity_Required","Department_Name_Address","Start_Date")
}
df1<-data.frame(rbind(final_df, df))

View(final_df)

df1
View(df1)

Hi @niti_28! Welcome to RStudio Community!

It looks like you posted your code but didn't post a question to go along with it. You are much more likely to get the help you are looking for if you are specific about what your question is and what issues/errors you are running into.

I also noticed that you loaded the reprex package but didn't actually use it. You can check out this FAQ on how to actually run your code in a reprex. One thing I will suggest is that you don't use install.packages() in your scripts as that will result in the package being reinstalled every time you load it which is not necessary.


It also looks like your code was not formatted correctly to make it easy to read for people trying to help you. Formatting code allows for people to more easily identify where issues may be occurring, and makes it easier to read, in general. I have edited you post to format the code properly.

In the future please put code that is inline (such as a function name, like mutate or filter) inside of backticks (`mutate`) and chunks of code (including error messages and code copied from the console) can be put between sets of three backticks:

```
example <- foo %>%
  filter(a == 1)
```

This process can be done automatically by highlighting your code, either inline or in a chunk, and clicking the </> button on the toolbar of the reply window!

This will help keep our community tidy and help you get the help you are looking for!

For more information, please take a look at the community's FAQ on formating code

4 Likes

I want to extract the data from all the pages of the website i mentioned in the above code and want to save in my local system and want to update my local file to be updated as the data present in the above website gets updated.