I am new to databases and getting up to speed with dbplyr and the idea of conducting data transformation inside a cloud database and just pulling down the results. It makes sense to me in theory, but I'm getting stuck in cases where dbplyr does not (yet) translate some R code to an SQL statement. Here's a simple example:
library(tidyverse)
library(dbplyr)
library(RPostgres)
library(DBI)
# to make a reproducible example database
# does not work for me because postgresHasDefault() == FALSE
# con <- dbConnect(RPostgres::Postgres(), dbname = "testdb")
# copy_to(con, rownames_to_column(mtcars), name = "mtcars")
# works outside of a database
mtcars %>%
tibble::rownames_to_column() %>%
mutate(nWords = lengths(strsplit(rowname, "\\W+"))) %>% # problem for dbplyr
filter(nWords < 2)
As I understand it, dbplyr does not translate strsplit():
Failed to prepare query: ERROR: function strsplit(text, unknown) does not exist
This is a specific example of a general problem for me. I want to do some manipulation on a large table before I collect the results, but sometimes the manipulation code I write in R does not translate to SQL via dbplyr.
I could use some advice about how to approach these problems.
Any function that dplyr doesn’t know how to convert is left as is.
Is the answer to figure out the right SQL equivalent and use translate_sql()?