Combining 2 pointers to tables in SQL server

I have acces to a SQL server database where every year is stored in its own view. I can load data for every year into R and do all calculations locally, but sometimes I would be a lot faster if I could combine several years with dbplyr and do the calculations in the database. But I'm not a SQL wizard and I don't know how to do that. Can anybode help me?

library(tidyverse)
library(dbplyr)

con <- DBI::dbConnect(odbc::odbc(), 
                      driver = "SQL Server", 
                      server = "XXXX", 
                      database = "XXXX", 
                      bigint = "numeric", 
                      encoding = "windows-1252")


dfstativ_first_year <- tbl(con, paste0("STATIV", "1997", "_tot"))
dfstativ_last_year <- tbl(con, paste0("STATIV", "2006", "_tot"))

dfstativ_first_year <- dfstativ_first_year %>% 
  select(Kommun) %>% 
  filter(Kommun == "1280") %>% 
  tally() %>% 
  mutate(year = 1997L)

# dfstativ_first_year

# Database: Microsoft SQL XXXXX \B/P0XXX]
# n  year
# <int> <int>
#   1 251539  1997

dfstativ_last_year <- dfstativ_last_year %>%
  select(Kommun) %>%
  filter(Kommun == "1280") %>%
  tally() %>%
  mutate(year = 2006L)

# dfstativ_last_year

# Database: Microsoft SQL XXXXX \B/P0XXX]
# n  year
# <int> <int>
#   1 276244  2006

# This is how I would combine tables in dplyr if the data was stored in R's memory
dfstativ_first_year %>% 
  bind_rows(dfstativ_last_year)


# But that gives me an error:
# Error: Argument 1 must be a data frame or a named atomic vector.

# Desired result is:
  
  #      n      year
  #    <int>   <int>
  # 1  251539  1997
  # 2  276244  2006

Can I solve this without having to download all data into R?

Solved thanks to this question on Stackoverflow.

I can do:

df <- purrr::reduce(list( dfstativ_first_year, dfstativ_last_year), union_all)

df  %>% collcect()

  #      n      year
  #    <int>   <int>
  # 1  251539  1997
  # 2  276244  2006

You can also use dplyr::union() since it gets translated into SQL's UNION hince runs on the database but I think a better approach would be to ask your DBA what is the source table of the view and query that directly, or if they are very sticky to the ANSI standard, create a specific view for that, or, if the table is actually a partitioned table you could simply query the parent table and let the RDBMS do the query planning.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.