I'm unclear how R works "under the hood" and that makes me wonder if I really should write some of the code that I can write.
For example. the following. The intention is to use a dataframe as the source of the data for a processing chain, if it exists, otherwise to load the data from a database. For this, I've written:
MessagesSummaryData =
(function() {
if (exists("pcapData2")) { print("Using dataframe")
pcapData2 %>%
select( LocalCt.Bin, ToTpf, RemAdr, serverPort ) %>%
filter( !(serverPort %in% c(20, 21, 23, 25, 26, 53, 69)) )
}
else { print("Using database")
dbcon <- dbConnect( RSQLite::SQLite(), dbName )
tbl(dbcon, "pcapData2") %>%
select( LocalCt.Bin, ToTpf, RemAdr, serverPort ) %>%
filter( !(serverPort %in% c(20, 21, 23, 25, 26, 53, 69)) | is.na(serverPort) ) %T>% explain() %>%
collect() %>%
#
# convert/restore variables mangled by db
#
mutate( ToTpf = as.logical(ToTpf), )
}
})() %>%
mutate( ... ) %>%
...
But does this build the processing chain efficiently or am I introducing inefficiences doing this?
Also, there seems no way of closing the database connection like this, that gives me some pause, though it doesn't seem to have caused any issues (yet).