Uploading large dataframe to SQL database

,

I'm using DBI::dbWriteTable to upload a table with ~11M rows to a database in Azure SQL.

However, I keep running into the following error:

Error in result_insert_dataframe(rs@ptr, values, batch_rows) : 
  nanodbc/nanodbc.cpp:1752: 08S01: [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: An existing connection was forcibly closed by the remote host.
  [Microsoft][ODBC Driver 18 for SQL Server]Communication link failure 

I've tried pushing the data by slicing it into smaller subsets like so:

row_slices <- seq(1, nrow(bseg_output_data), by = 100000)

con <- connect_azure_sql()

dbWriteTable(conn = con, 
             name = "behavioural_segmentation_data", 
             value = bseg_output_data %>% slice(1:row_slices[2]), 
             overwrite = TRUE)

dbDisconnect(con)

for(i in 2:length(row_slices)) {
  con <- connect_azure_sql()
  print(i)
  dbWriteTable(conn = con, 
               name = "behavioural_segmentation_data", 
               value = bseg_output_data %>% slice(i:row_slices[i+1]), 
               append = TRUE)
  dbDisconnect(con)
}

(I think there's some faulty logic with the row_slices[i+1] part but ignore that.)

When I do it this way, i gets up to about 4 but the same error occurs. I kept opening and closing the connection inside the for loop hoping that that might solve the issue. But really just fumbling about in the dark here.

Is there something that I'm doing wrong, or a better way to upload a large table to an SQL database?

Any help would be appreciated!

I think this might be the cause actually:

suppose row_slices = c(1, 100000, 200000, 300000) then

i i:row_slices[i+1] length (approximate)
1 1:100000 100,000
2 2:200000 200,000
3 3:300000 300,000

So in your loop, you're uploading bigger and bigger datasets, essentially ending up submitting the whole dataset at once. Don't forget to index the start too:

value = bseg_output_data %>% slice(row_slices[i]:row_slices[i+1]),

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.