Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows

I am running a script in RStudio (a wonderful RStudio community member helped me with it) to scrape Goodreads reviews. Recently, I got an error message for some of the pages I'm trying to scrape. I don't wish to keep disturbing the person who helped me, so I've been trying (and failing) to solve it myself the past few days. I've been working on it again earlier, but I just keep on getting it wrong again, so I thought I might ask here.

This is the error I get:

Error in data.frame(..., check.names = FALSE) : 
arguments imply differing number of rows: 31, 30

The numbers at the end may change, f.e. 30, 33 . What seems strange to me is that the error is not constant, it only occurs for some of the pages I'm trying to scrape, although the script itself remains the same. Example: scraping the reviews of The Handmaid's Tale (https://www.goodreads.com/book/show/38447.The_Handmaid_s_Tale?ac=1&from_search=true&qid=ZGrzc7AfLN&rank=1) causes an error ( 32, 30 ), but scraping the reviews of Typhoon Kingdom (https://www.goodreads.com/book/show/52391186-typhoon-kingdom) causes no problems.

I've removed some parts of the code to find out where the problem comes from and it seems to me that it must be caused by either this piece of code that extracts the review-IDs:

#Get the review ID's from all the links
  reviewId = reviews.html %>% str_extract("/review/show/\\d+")
  reviewId = reviewId[!is.na(reviewId)] %>% str_extract("\\d+")

or by the line finalData = rbind(finalData, cbind(reviewId, rbind(fullReviews, partialReviews))). When I remove the first piece of code and change finalData = rbind(finalData, cbind(reviewId, rbind(fullReviews, partialReviews))) back to finalData = rbind(finalData, fullReviews, partialReviews) (the review-IDs weren't extracted at originally), the script runs without problems and without causing any errors. However, I really need to be able to extract these review-IDs to properly anonymise my data, so simply leaving it out is not really an option.

I've tried to exchange that part of the code with this, as this should also be able to scrape the review-ID as well (but please correct me if I'm wrong):

#Get the review ID's from all the links
  reviewId = reviews.html %>% str_extract("review_\\d+")
  reviewId = reviewId[!is.na(reviewId)] %>% str_extract("\\d+")

This did not solve the problem and caused the same error, though with some differences: 1. the error has completely different numbers: Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 30 and 2. the error now occurs for every single URL instead of for some, so actually managed to somehow make it worse.

I've googled the error message and apparently the problem could be caused if there aren't as many rows as columns in a dataframe. Some say using rbind.fill and cbind.fill could work as a solution, but apparently you can't install rowr in R 4.0.1 and only using rbind.fill didn't solve the problem.
There are a lot of online questions about this error message and just as many different solutions, but so far I haven't found one that works for this script.

Does anyone know how this problem might be solved? Concrete steps would be very appreciated. Thank you!

please provide a reprex FAQ: How to do a minimal reproducible example ( reprex ) for beginners
You probably just need some simple logic around the conversion to data.frame step to account for skippig missing elements . but we should see the data that you are attempting to combine.
please share an example(s) of reviewId, fullREviews, partial Reviews ...
Its not clear whether you run this as a single step or as a loop...

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.