Dbplyr lazy_ops: nrow returns NULL for a simulated source?


#1

I’m trying to understand tbl_lazy and op_* behaviour.

I expect that nrow() should return NA for a simulated source, but we get NULL

library(dbplyr)
library(dplyr)
## example data 
df <- tibble::tibble(apples = 1:3, oranges = c("a", "b", "c"))
## example SQLite db
db <- src_sqlite(tempfile(), create = TRUE)

## a real SQLite tbl
real_tbl <- copy_to(db, df)
## a simulated SQLite tbl
sim_tbl <- tbl_lazy(df, simulate_sqlite())

## NA, as expected
nrow(real_tbl)
## NULL, not expected
nrow(sim_tbl)

The problem comes with the print:

## this causes printing to fail in trunc_mat
print(sim_tbl)
#Error in if (is.na(rows) || rows > tibble_opt("print_max")) { : 
#    missing value where TRUE/FALSE needed
#  In addition: Warning message:
#    In is.na(rows) : is.na() applied to non-(list or vector) of type 'NULL'

## though other ops do work
str(op_base(real_tbl, "apples"))
str(op_base(sim_tbl, "apples"))

My question is, should nrow(real_tbl) return NA? How does that work though, I’m confused about the ops_ control here.

Are there other examples that use lazy_ops but that don’t use a real database? I’m trying to wrap the src/tbl/collect idioms to provide lazy exploration as a proof of concept.


#2

I think I found out, dbplyr::dim.tbl_sql overrides the nrow behaviour, and this is simply not defined for the simulated examples, but it’s trivial to create my own class and define dim for them.

Thanks!