Connecting to an online database (Within an organization): How to

Hello All,
I will connect to my organization's online database and my manager mentions something about ODBC Connection.

Is there a tutorial on how to use SQL in R to connect to the online database to get the data?

I hope I phrase the question clear enough. I just don't know how to do this.

Thank you.

1 Like

This should get you started!
http://db.rstudio.com/getting-started/

1 Like

@mara
Hello Mara,
In the link you sent me, I am having trouble understanding what DSN is.
For instance, I am looking at step 3 in "Connect to a Database"

Determine if a DSN is going to be used to connect to the database. This would be typically something that the Database Administrator, or the technical owner of the database, would setup and provide the R developer a name (known as an alias). Use dbConnect() to open a database connection in this manner:

con <- dbConnect(odbc(), "DSN name")

I ran that code in R and the result was:

Error: nanodbc/nanodbc.cpp:950: IM002: [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified 

This means my organization does not use DSN (If I am understanding DSN correctly, it is like a channel/tunnel you go through to get the data????^^), so I needed to proceed to step 4 in the same article. I had an error:

con <- dbConnect(odbc(), 
                 Driver = "SQL Server", 
                 Server = "localhost\\SQLEXPRESS", 
                 Database = "datawarehouse", 
                 Trusted_Connection = "True")

Error: nanodbc/nanodbc.cpp:950: 08001: [Microsoft][ODBC SQL Server Driver][DBNETLIB]SQL Server does not exist or access denied. 

What can I do next?

Thank you, Mara!

If you’re trying to connect to a specific existing database and you don’t know the correct credentials, you need to talk to whoever is in charge of administering the database to find out what they are. The code on the tutorial page is just an example — it’s not going to work as-is (unless by some wild coincidence you happened to have a local database with precisely that configuration!).

Here’s a basic definition of DSNs:
https://support.microsoft.com/en-us/help/966849/what-is-a-dsn-data-source-name

2 Likes

Thank you everyone!
My manager helped me out.

@jcblum
I need more help in this topic.

Now, I am using R Notebook because R script does not seem to work for me.
So, let's take a look at this example

SELECT "month_idx", "year", "month", SUM(CASE WHEN ("term_deposit" = 'yes') THEN (1.0) ELSE (0.0) END) AS "subscribe",
COUNT(*) AS "total"
FROM ("bank") 
GROUP BY "month_idx", "year", "month"

I noticed that the example put "..." around the variables such as "month_idx". I used SQL Studio Management and the SQL command we use in there does not need to put quotation mark around the variables.

I am wondering in R Notebook or R script , do I have to put quotation mark around the variables?
Because there are A LOT OF corrections for me to do that, I want to make sure whether there is a way around it.

Right now, I have:

con <- dbConnect(odbc::odbc(), "a server name will go into here")

You know like when you import a data set and store it into a variable name called "df" or something. Then you attach(df), which gives you access to variables in df without referring to "df" when you need to do some operations on the variables and you don't need a quotation mark around these variables when calling these variables. For instance, df is a dataframe with "grades" and "GPA" fields. After calling "attach(df)", you can just do some operations such as sum(GPA) instead of sum(df$GPA). I think you all know what I mean. If not, let me know.

I am thinking whether there is a way for me in SQL command in R notebook to call up these variables without putting ".." around the variables.

What do you think? Impossible?

@jcblum

SELECT  "a13.ACCTG_YEAR_KEY  ACCTG_YEAR_KEY",                       
               max("a114.ACCTG_YEAR_DESCR")  "ACCTG_YEAR_DESCR",  
.......                        

Also, my friend at work informed me that in the SQL command above, the second field next to the first one is called alias. The reason why the command is set up like that is when I run the SQL command in Microsoft SQL Server Management Studio, without the second alias/field, the column, for example "a13.ACCTG_YEAR_KEY", with all the observations will show up but there is not Header. So I would think the second field such as ACCTG_YEAR_KEY is the header for the column "a13.ACCTG_YEAR_KEY".
Like I mention above, in MS SQL studio, there is not need for quotation "..."

Anyhow, I still cannot run the code. Here is the error

Error in sqlParseVariablesImpl(sql, list(sqlQuoteSpec("\"", "\""), sqlQuoteSpec("'",  : 
  Unterminated literal
Failed to execute SQL chunk

Thanks team!

I'm assuming you're talking about SQL inside a SQL chunk? Did you try running a query on your actual database without quoting the column names?

The style with all the column names identified (in this case with quotes) is defensive, since it prevents conflicts between column names and any reserved words. But different DBMSes impose different requirements, people don't always write their SQL that defensively, and column name identifiers differ between DBMSes. For instance:

Anyway, again you should not take the examples as necessarily being applicable to your database, since details of syntax vary between DBMSes. The SQL inside a SQL chunk is passed on to the database by DBI (via the knitr SQL engine), so theoretically you shouldn't have to alter how you would otherwise write SQL for your database — but honestly, the best way to sort out what will work best with your database is to test with your database.

@jcblum
Hmm...I am unclear...

Update:
I got it. Thanks all.

@jcblum
Now, in R notebook, I have this command:

```{sql, connection = con, output.var = "mydf"}

Am I able to output "mydf" into a .csc or .xlsx file?
I am loading 1 year of dataset and it has been running for 8 minutes.
Not that I am not patient but I want to run a csv file on my new machine.
It runs like light...
Thanks!