Slow creating table in BigQuery with BigRQuery

Hello,

I seem to be having trouble uploading a data frame to a new table in BigQuery. It seems to take an inordinate amount of time for the size of table. 209MB. Wait time is possibly 10 minutes or longer, and then often I get a timeout error (408), and then occasionally it actually works. What am I doing wrong? I've tried the DBI interface as well as bg_perform_upload and both seem to have the same outcome.

Minimal reproducible example would be:

src<-"path/to/csv"
new_tbl <- read_csv(src)
#making sure it's a dataframe
new_tbl<-as.data.frame(new_tbl)

billing<-"###############"
dataset<-"###############"
project<-"###############"

con <- dbConnect(
bigrquery::bigquery(),
project = project,
dataset = dataset,
billing = billing
)

DBI::dbWriteTable(con, "temp", new_tbl)

Also tried the following

bq_Temp<-bq_table(project, dataset, "temp")
bq_perform_upload(
bq_Temp,
new_tbl,
fields = NULL,
create_disposition = "CREATE_IF_NEEDED",
write_disposition = "WRITE_EMPTY",
priority= "INTERACTIVE",
billing = billing
)

When it does work, it seems to be hanging for a long time before progressing with the actual upload. I'm wondering if there is a config I need to do on the BigQuery side or if there are any elements of the data frame that need to change prior to executing the upload.

The docs indicate

The bigrquery package provides three levels of abstraction on top of BigQuery:

  • The low-level API provides thin wrappers over the underlying REST API. All the low-level functions start with bq_ , and mostly have the form bq_noun_verb() . This level of abstraction is most appropriate if you’re familiar with the REST API and you want do something not supported in the higher-level APIs.
  • The DBI interface wraps the low-level API and makes working with BigQuery like working with any other database system. This is most convenient layer if you want to execute SQL queries in BigQuery or upload smaller amounts (i.e. <100 MB) [emphasis added] of data.
  • The dplyr interface lets you treat BigQuery tables as if they are in-memory data frames. This is the most convenient layer if you don’t want to write SQL, but instead want dbplyr to write it for you.

That suggests DBI::dbWriteTable is unsuitable for an object the size of new_tbl.

As for the bq methods, setting quiet = FALSE may provide insight to
the choke points.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.