XML2 library: Question about correct usage of library, or have I found a bug related to performance

Hi, Peter,

I've been away from XML for a good time, so I can only speak in generalities about this situation. A definite FWIW

xml2 is built atop the libxml2 C library, which does the heavy lifting. And it's a relatively mature (v1.3.0) package, so there's presumably been time for this kind of scaling issue to surface if it's intrinsic. It also appears, from a thread last year that the functions and arguments chosen can greatly affect performance time, in general.

This thread discusses using a single xpath expression assembled with paste that speeds things up for the case discussed there.

Looking at it from 30,000 feet, the chain of function calls

xml_children
pbmclapply
write_tsv
safe_getAttr
safe_getAttr
paste
paste

just feels that read/write/read/write/read/write might be where the slowdown happens, especially on a non-SSD drive.

1 Like