Reading xml nodes with parallel

I have downloaded a zip file https://www.imf.org/~/media/Files/Publications/WEO/WEO-Database/2020/02/weooct2020-sdmxdata.ashx
and i extracted it, then read the file "WEO_PUB_OCT2020.xml"

library(tidyverse)
library(magrittr)
library(pbapply)

datac <- xml2::read_xml("WEO_PUB_OCT2020.xml") %>%
  xml_find_all("//Series")
cl <- parallel::makeCluster(detectCores() - 1)
parallel::clusterEvalQ(cl, {
  .libPaths()
  library(magrittr)
  library(tidyverse)
  library(xml2)
})
clusterExport(cl, "datac")

dataset <- pbapply::pblapply(datac, function(e) 
                           {bind_cols(xml_attrs(e) %>% enframe %>% pivot_wider(everything()) %>% 
                                        select(economy = REF_AREA, code = CONCEPT),
                                      xml_find_all(e, ".//Obs") %>% 
                                        xml_attrs(c("TIME_PERIOD", "OBS_VALUE")) %>% 
                                        lapply(function(x) x %>% tibble::enframe() %>% pivot_wider(everything()) %>%
                                                 set_colnames(c("year", "value"))) %>% 
                                        bind_rows())
                           }, cl = cl) %>% 
   bind_rows()

then i get an error " 0%Error in checkForRemoteErrors(val) : 3 nodes produced errors; first error: external pointer is not valid"
Please advise.

This is not possible, because XML objects of the xml2 package, as you've got here with datac, cannot be exported to another R process, and if done they are not valid in that external R workers. This is true regardless of parallel frameworks. This problem is mentioned in https://cran.r-project.org/web/packages/future/vignettes/future-4-non-exportable-objects.html

What if I use XML package, will I get an error or not????

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.