How to solve the problems of dplyr with connected databases?

I have connected to a database using pool package. I need to transmute a column and I get an error.

library (pool) 
library (dplyr) 
library (RMySQL) 

my_db <- dbPool(
   MySQL(), 
   donate = "aaa", 
   host =" localhost", 
   username = "root"
) 

my_db %>% tbl("bbb") %>%
     mutate(Date = paste(Years, Q)) %>%
     transmute(Date = factor(Date), 
                             Product) 

The error message I get:

#Error in  . local(connect, statement, ...) 
#   could not run statement: FUNCTION aaa. factor does not exist

EDIT ONE
I have found multiple answer that requiere that I create a function to apply the require SQL querry. But I have not knowledge of SQL(that's a problem I have been avoiding). After to much looking and trying I came up with this:

library (pool) 
library (dplyr) 
library (RMySQL) 

my_db <- dbPool(
   MySQL(), 
   donate = "aaa", 
   host =" localhost", 
   username = "root"
) 

my_db %>% tbl("bbb") %>% as.data.frame() %>%
     mutate(Date = paste(Years, Q)) %>%
     transmute(Date = factor(Date), 
                             Product) 

I'm not sure if this is a solution. The goal of using the pool package is to do not need to load the table in my memory and I think that is exactly what as.data.frame() do.

Thanks in advance for helping

See the FAQ: How to do a minimal reproducible example reprex for beginners.

The idea behind an R database connection is to allow bringing variables and records into memory selectively, perform some operation and then doing something with the result. That something may include writing it back to the database but doing so can be unwise.

The database libraries

  1. Open a connection
  2. Allow the database to be queried
  3. Close the connection

The {pool} library deals with 1 & 3. It is primarily intended for use within shiny apps and can be replaced with more direct approaches. For example, see the example in help(dbPool).

Other libraries deal with all three aspects. A script would look like

library(DBI)
library(RMariaDB)

con <- dbConnect(RMariaDB::MariaDB(), 
                 username="the_user", 
                 password="the_password", 
                 dbname ="r")

dbListTables(con)
#> [1] "jgr"
dbListFields(con, "jgr")
#>  [1] "recid" "id"    "v1"    "v2"    "v3"    "v4"    "v5"    "v6"    "v7"   
#> [10] "v8"    "v9"    "v10"   "v11"   "v12"   "v13"   "v14"   "v15"   "v16"  
#> [19] "v17"   "v18"   "v19"   "v20"   "v21"   "v22"   "v23"   "v24"   "v25"  
#> [28] "v26"   "v27"   "v28"   "v29"   "v30"

x <- dbGetQuery(con, "SELECT v4 FROM jgr WHERE id = 'NS_000'")
y <- dbGetQuery(con, "SELECT v6 FROM jgr WHERE id = 'NS_022'")

dbDisconnect(con)


dat <- cbind(x,y)
fit <- lm(v6 ~ v4, data = dat)
summary(fit)
#> 
#> Call:
#> lm(formula = v6 ~ v4, data = dat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -618.21 -228.26   -6.68   13.13 1950.54 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -3.18906   13.50270  -0.236    0.813    
#> v4           0.59126    0.01733  34.125   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 391.6 on 999 degrees of freedom
#> Multiple R-squared:  0.5383, Adjusted R-squared:  0.5378 
#> F-statistic:  1165 on 1 and 999 DF,  p-value: < 2.2e-16

This does use two SQL statements, like SELECT v4 FROM jgr WHERE id = 'NS_000'. An alternative is provided by {dbplyr}, which relieves even that slight degree of complexity by allowing queries using {dplyr} syntax.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.