Hi, Peter,
I've been away from XML
for a good time, so I can only speak in generalities about this situation. A definite FWIW
xml2
is built atop the libxml2
C
library, which does the heavy lifting. And it's a relatively mature (v1.3.0) package, so there's presumably been time for this kind of scaling issue to surface if it's intrinsic. It also appears, from a thread last year that the functions and arguments chosen can greatly affect performance time, in general.
This thread discusses using a single xpath
expression assembled with paste
that speeds things up for the case discussed there.
Looking at it from 30,000 feet, the chain of function calls
xml_children
pbmclapply
write_tsv
safe_getAttr
safe_getAttr
paste
paste
just feels that read/write/read/write/read/write
might be where the slowdown happens, especially on a non-SSD drive.