Sqldf questions in R studio

I am facing with two questions of sqldf in R studio:

1- I want to control one variable's value in sqldf's where clause. However, I tried several ways: it doesn't work well. I want that control[i] can change where condition in each run. Can anyone help on this?

control = c(16,18)
for (i in 1:2){
  label <- paste("data",i,sep="_")
  assign(label,fn$sqldf("select * 
                        where time between '20170502' and '20170504' or                  
                        sales between 'control[i]' and '23' ))
}

Secondly, The following code didn't work as well in sqldf.

test1 <- sqldf("select *, (case when time between '20170530 00:00:00' and '20170530 03:00:00' then 1 else 0 end) as newdata  from olddata")

The data type show in R is [1] "POSIXct" "POSIXt" . Attached example data as below:

time sales

2017/5/01 01:00:00 1249
2017/5/03 12:00:00 1683
2017/5/04 13:00:00 1909
2017/5/06 06:00:00 1686
2017/5/08 18:00:00 1500
2017/5/25 04:00:00 11586
2017/5/30 04:00:00 11834
2017/5/28 04:00:00 11909
2017/5/30 03:00:00 11962
2017/5/30 01:00:00 12500

I think this note applies to both of your problems (from sqldf FAQ #3):

The SQL statement passed to sqldf must be a valid SQL statement understood by the database. The functions that are understood include simple SQLite functions and aggregate SQLite functions and functions in the RSQLite.extfuns package.

In the first case, SQLite has no idea what's going on in your for loop, and can't understand an R extraction operator. So you need to interpolate the iteration-specific value you want into the string you pass to sqldf — for instance, using sprintf (though personally, I prefer glue).

In the second case, I think there are two issues:

  1. SQLite only recognizes time strings in the following formats:

    YYYY-MM-DD
    YYYY-MM-DD HH:MM
    YYYY-MM-DD HH:MM:SS
    YYYY-MM-DD HH:MM:SS.SSS
    YYYY-MM-DDTHH:MM
    YYYY-MM-DDTHH:MM:SS
    YYYY-MM-DDTHH:MM:SS.SSS
    HH:MM
    HH:MM:SS
    HH:MM:SS.SSS
    now
    DDDDDDDDDD

    So your SQL string needs to be:
    "select *, (case when time between '2017-05-30 00:00:00' and '2017-05-30 03:00:00' then 1 else 0 end) as newdata from olddata")

  2. Even more importantly, if you want to compare against SQLite timestrings using sqldf, the dates in your data frame need to be stored as character values. See sqldf FAQ #4:
    https://github.com/ggrothendieck/sqldf#4-how-does-sqldf-work-with-date-class-variables

(if date processing is a big part of what you're doing, you might consider using one of the other database engines sqldf can use that actually has full date support, such as H2)

1 Like

Thanks for the answer jcblum. To question 1, it doesn't work as well. That time is created from the following code: as.POSIXct(mydata$time,format="%Y-%m-%d %H:%M:%S",tz="America/New_York") The original format is factor.

when I check the new column by class(), it was shown as [1] "POSIXct" "POSIXt"

Interestingly, I found one other strange thing from the as.POSIXct transfer: all date with 00, time will be removed. For example, 2017-05-21 00:00:00 will become as 2017-05-21... 2017-05-21 13:00:00 will be still 2017-05-21 13:00:00

I'm afraid I'm not following what you mean. Can you post a reproducible example, so I can see what you're talking about?