R bigrquery returns the following error: Exceeded rate limits

Hello everyone,

I am trying to download a BigQuery data set from Google Cloud Platform into R workspace in order to analyze it, using the following code:

library(bigrquery)
library(DBI)
library(tidyverse)
library(dplyr)


con = dbConnect(
  bigquery(),
  project = "bigquery-public-data",
  dataset = "new_york_citibike",
  billing = "maanan-bigquery-in-r"
)

bigrquery::bq_auth()

my_db_pointer = tbl(con, "citibike_trips")

glimpse(my_db_pointer)

count(my_db_pointer)

selected  =  select(my_db_pointer, everything()) %>% collect()

However, when I try to run the last line in order to download the data, it returns the following error:

Complete
Billed: 0 B
Downloading first chunk of data.
Received 55,308 rows in the first chunk.
Downloading the remaining 58,882,407 rows in 1420 chunks of (up to) 41,481 rows.
Downloading data [=====>--------------------------------------------------------------------------------------------------]   6% ETA: 19m
Error in `signal_reason()`:
! Exceeded rate limits: Your project:453562790213 exceeded quota for tabledata.list bytes per second per project. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors [rateLimitExceeded] 
ℹ Try increasing the `page_size` value of `bq_table_download()`
Run `rlang::last_error()` to see where the error occurred.

I don't use Google Cloud, so no definitive answer, but some possibilities. First, the question is to identify the actual method that is called, it's collect.tbl_BigQueryConnection() whose source code you can find here. First, note the options:

collect.tbl_BigQueryConnection <- function(x, ...,
                                           page_size = NULL,
                                           max_connections = 6L,
                                           n = Inf,
                                           warn_incomplete = TRUE)

So you could try decreasing max_connections and increasing page_size (which is what the error message suggested).

Then, if you look at the function source code, you see there is a bit of boilerplate, and at the end this call:

out <- bq_table_download(tb,
    n_max = n,
    page_size = page_size,
    quiet = quiet,
    max_connections = max_connections,
    bigint = bigint
  )

So the actual work of downloading the data is actually done by bq_table_download, with the arguments being passed down (explains why the error message is mentioning it). Let's look at the manual of that function:

This retrieves rows in chunks of page_size . It is most suitable for results of smaller queries (<100 MB, say). For larger queries, it is better to export the results to a CSV file stored on google cloud and use the bq command line tool to download locally.

So, I don't have the knowledge of GC to tell you how to do that, but considering 60M rows, if each has 5 numeric (1 Byte/value) column that's about 6e6*5*8 = 240 MB you're totally in the second situation.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.