I'm trying to parallelize some parsing of an {xml2} xml document. I know external pointers get created, and it looks like that doesn't work with the {furrr} package as I am using it. Is there a way to parallelize processing of a document that respects pointers (i.e. doesn't collapse to R objects with as_list()
)? Below is a reprex that shows what I want to do, and with it working with non-parrallelized version, and failing with my attempt at a parallelized version.
library(xml2)
library(purrr)
library(furrr)
#> Loading required package: future
# WORKS: local: mapping to elements
xmldoc <- read_xml("<root><child>a</child><child>b</child></root>")
elt_paths <- xml_find_all(xmldoc, "//child") |> map_chr(xml_path)
map(elt_paths, ~{xml_text(xml_find_first(xmldoc, .x))})
#> [[1]]
#> [1] "a"
#>
#> [[2]]
#> [1] "b"
# DOES NOT WORK: how do I make the xml object seeable by the internal functions
tryCatch({
xmldoc <- read_xml("<root><child>a</child><child>b</child></root>")
elt_paths <- xml_find_all(xmldoc, "//child") |> map_chr(xml_path)
plan(multisession, workers = 2)
res <- future_map(elt_paths, ~{xml_text(xml_find_first(xmldoc, .x))})
plan(sequential)
print(res)
}, error=function(e) print(e))
#> <error/purrr_error_indexed>
#> Error:
#> ℹ In index: 1.
#> Caused by error in `xml_ns.xml_document()`:
#> ! external pointer is not valid
#> ---
#> Backtrace:
#> ▆
#> 1. ├─parallel (local) workRSOCK()
#> 2. │ └─parallel:::workLoop(...)
#> 3. │ └─parallel:::workCommand(master)
#> 4. │ ├─base::tryCatch(...)
#> 5. │ │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 6. │ │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 7. │ │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#> 8. │ ├─base::tryCatch(...)
#> 9. │ │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 10. │ │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 11. │ │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#> 12. │ ├─base::do.call(msg$data$fun, msg$data$args, quote = TRUE)
#> 13. │ └─future (local) `<fn>`(...)
#> 14. │ └─base::eval(expr, envir = envir, enclos = enclos)
#> 15. │ └─base::eval(expr, envir = envir, enclos = enclos)
#> 16. ├─base::tryCatch(...)
#> 17. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 18. │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 19. │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#> 20. ├─base::withCallingHandlers(...)
#> 21. ├─base::withVisible(...)
#> 22. ├─base::local(...)
#> 23. │ └─base::eval.parent(substitute(eval(quote(expr), envir)))
#> 24. │ └─base::eval(expr, p)
#> 25. │ └─base::eval(expr, p)
#> 26. └─base::eval(...)
#> 27. └─base::eval(...)
#> 28. ├─base::withCallingHandlers(...)
#> 29. ├─base::do.call(...furrr_map_fn, args)
#> 30. └─purrr (local) `<fn>`(.x = "/root/child[1]", .f = `<fn>`)
#> 31. └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
#> 32. ├─purrr:::with_indexed_errors(...)
#> 33. │ └─base::withCallingHandlers(...)
#> 34. ├─purrr:::call_with_cleanup(...)
#> 35. └─.f(.x[[i]], ...)
#> 36. └─global ...furrr_fn(...)
#> 37. ├─xml2::xml_text(xml_find_first(xmldoc, .x))
#> 38. ├─xml2::xml_find_first(xmldoc, .x)
#> 39. └─xml2:::xml_find_first.xml_node(xmldoc, .x)
#> 40. ├─xml2::xml_ns(x)
#> 41. └─xml2:::xml_ns.xml_document(x)
Created on 2024-05-21 with reprex v2.1.0