Windows doesn’t allow mclapply number of core >1

Hello this is my 1st posted question, so apologies for any newbie behavior. The ask is “how can I use múltiple cores in Rstudio” when using a Windows Machine. My current blocker is that numcores >1 is not allowed for the mclapply function.

mclapply() doesn't work on Windows, you can use parLapply() instead.

cl <- makeCluster(4) # Number of cores
result <- parLapply(cl,...) # Complete the rest of the arguments as needed
2 Likes

I get an error saying inherent method for function dbGetQuery for signature “Impala”, “character”. Could this be related to a forum where I read I need to load the libraries onto each cluster in order to run on windows.

I was thinking on multicore parallelization on the same machine, but it seems like your setup is more complex than you have told us so far, I think you need to be more specific and ideally provide a reproducible example if you want to get any meaningful advice.

1 Like

I’m on a single machine but windows doesn’t allow forking, I created the clusters like you mentioned but for some reason it gives me the above error when evoking the dbGetQuery function of the DBI package. I’ll provide more context once on my pc.

I think I narrowed down the issue, your suggestion @andresrcs for using parLapply was great! It got me this far:



cl <- makeCluster(detectCores(logical = FALSE))
clusterEvalQ(cl=cl, {require(tidyverse)
                    require(parallel)
                    require(DBI)
                    require(odbc)
                    con <- dbConnect(odbc::odbc(), 
                     Driver = "Cloudera ODBC Driver for Impala", 
                     Host = "myserver", Port = 1234)
                    })

clusterExport(cl=cl,"QueryList")
clusterExport(cl=cl,"con")
QryResults <- parLapply(cl=cl,QueryList,dbGetQuery,conn=con)
stopCluster(cl=cl); print("Cluster stopped.")

My current issue is that the connection doesn't seem to be getting established on the clusters
I get error: Error in connection_info(dbObj@ptr) : external pointer is not valid - any thoughts ?

I think what I'm going to do is post this question as solved. Since technically @andresrcs suggestion does solve for the title of the question then create a new one called

"exporting connections to your clusters etc...."

Created New Topic for connection export: Parallel / Multicore processing - Exporting connection(s) to clusters

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.