Working With MSSQL Database

Hi All,

I am very new R and I am working with a database. I have been able to connect with the database using the below code.

install.packages("odbc")
library(odbc)
con <- dbConnect(odbc::odbc(),
                 Server = "Serverlink",
                 Database = "name",
                 UID = "username", 
                 PWD = "password",
                 Driver="SQL Server"
)

then saved one of the views of the db as a table in R. using the below code.

install.packages("dbplyr")
library(dbplyr)
#to save the table in R 
view_scoreinfo <-tbl(con,"view_name")

To get an understanding of datatypes I started to list down the class of all variables. It was showing "NULL" as a result for all of them.

So I started converting all of them one by to the required datatype.

view_scoreinfo$GroupId <- as.factor(view_scoreinfo$GroupId)
class(view_scoreinfo$GroupId)
view_scoreinfo$block <- as.factor(view_scoreinfo$block)
class(view_scoreinfo$block)

as I keep on doing it, in the "environment" the view_scoreinfo, which is supposed to be a table, is listed down as a "list", the number of the list under it keeps on increasing. Why is this happening ??? Can someone help ??

Best,
Kumar Ashwarya

what you're running into is view_scoreinfo is not a local data frame. It's a lazy dataframe which is a pointer to a table on the server but no data has been returned to your computer. You can read about that here: https://dbplyr.tidyverse.org/articles/dbplyr.html

it's really worth reading the intro to dbplyr so you have a mental model of what's going on. The high level view is that you can set up your operations using dplyr and then when you call collect dbplyr actually writes sql and returns the desired results to R. So you could do something like this:

view_scoreinfo <-tbl(con,"view_name")
view_scoreinfo %>%
   group_by(GroupId) %>%
   mutate(avg_score = mean(score)) %>%
   collect ->
local_avg_score

If you do that then you have a local data frame but all the grouping and averaging happens on the sql server.

2 Likes

Also when I ran the below code to find out the variable type, these were the results.

sapply(view_scoreinfo, class)
$src
[1] "src_dbi" "src_sql" "src"

$ops
[1] "op_base_remote" "op_base" "op"

$DateAdded
[1] "POSIXct" "POSIXt"

$GroupId
[1] "factor"

$block
[1] "factor"

why is this happening???

Hey @jdlong, reading through it.

I am facing another issue. I was using tbl function to save the view as a table, it is in the "dplyr" package. I am getting another error in this command!!!!

view_scoreinfo <-tbl(Postab,"VIEW_SCOREINFO_ALL")

I am getting the following error,

Error in UseMethod("tbl") :
no applicable method for 'tbl' applied to an object of class "Microsoft SQL Server"

Yesterday this function was working absolutely fine. Not able to find a function in dbplyr which works the same way as tbl.

What should I do?

Need Help!

Best,
Kumar Ashwarya

how is Postab created? It needs to be a valid DBI connection to your database.

using the above I created the connection. Is that correct?

FY reference. I am looking at this webinar available at R studio website to work on the data.

https://www.rstudio.com/resources/webinars/working-with-big-data-in-r/

Thanks in advance

@jdlong

Hey, I found the error,
I had to install both dplyr as well dbplyr before running the code that I had shared earlier.

view_scoreinfo <-tbl(Postab,"VIEW_SCOREINFO_ALL")

But I am running in another problem.

Error: 'SELECT TOP 6 "rnk"
FROM "VIEW_SCOREINFO_ALL"'
nanodbc/nanodbc.cpp:1587: 08S01: [Microsoft][ODBC SQL Server Driver]Communication link failure

I am getting the above error when I running the below code:

head(select(view_scoreinfo,rnk))

rnk is a column in the view_scoreinfo in which I have saved the view from the database.

I also ran a different command and got the same error.

view_scoreinfo %>% filter(view_scoreinfo$rnk==1)

above is the command.

Error: 'SELECT TOP 10 *
FROM "VIEW_SCOREINFO_ALL"
WHERE ( = 1.0)'
nanodbc/nanodbc.cpp:1587: 08S01: [Microsoft][ODBC SQL Server Driver]Communication link failure

Why is there a communication link failure?

Best,

the connection has dropped. Just reconnect by running the code that creates the connection.

@jdlong
It worked with a stable connection.