Good afternoon,
This nested XML is giving me a hard time.:
<DOCUMENT>
<TEST_TYPE NAME="Test, no warmup, Mar 13, 2019 Build 006">
<RESULT_SET>
<DATE>2021:8:10 11:22:40 (GMT-05:00)</DATE>
<OPERATOR>User</OPERATOR>
<INSTRUMENT SERIALNUM="ABC1234">ERROR</INSTRUMENT>
<INSTRUMENT_EXTRA>
<BOARDTEMP LABEL="Board Temp:">58.042717</BOARDTEMP>
<LASERX LABEL="Laser X:">5.539117</LASERX>
<LASERY LABEL="Laser Y:">5.370719</LASERY>
<LASERR LABEL="Laser R:">4.868718</LASERR>
</INSTRUMENT_EXTRA>
<SPECTRUM NAME="Resolution Half Small" INDEX="1">
<TEST NAME="Loc 2176" HI="2176.334" LO="2176.234"> <RESULT>2176.794</RESULT> </TEST>
<TEST NAME="Width 2176" HI="0.60" LO=""> <RESULT>0.386</RESULT> </TEST>
<TEST NAME="Loc 2179" HI="2179.822" LO="2179.722"> <RESULT>2179.254</RESULT> </TEST>
<TEST NAME="Width 2179" HI="0.60" LO=""> <RESULT>0.354</RESULT> </TEST>
</SPECTRUM>
<SPECTRUM NAME="Resolution Half Small" INDEX="2">
<TEST NAME="Loc 2176" HI="2176.334" LO="2176.234"> <RESULT>2176.794</RESULT> </TEST>
<TEST NAME="Width 2176" HI="0.60" LO=""> <RESULT>0.386</RESULT> </TEST>
<TEST NAME="Loc 2179" HI="2179.822" LO="2179.722"> <RESULT>2179.254</RESULT> </TEST>
<TEST NAME="Width 2179" HI="0.60" LO=""> <RESULT>0.354</RESULT> </TEST>
</SPECTRUM>
<SPECTRUM NAME="Wavelength Precision Half Small" INDEX="1">
<TEST NAME="STD 2176" HI="0.0010" LO=""> <RESULT>0.11426</RESULT> </TEST>
<TEST NAME="STD 2179" HI="0.0010" LO=""> <RESULT>0.42696</RESULT> </TEST>
<TEST NAME="Min 2176" HI="" LO=""> <RESULT>2176.71582</RESULT> </TEST>
<TEST NAME="Max 2176" HI="" LO=""> <RESULT>2177.05127</RESULT> </TEST>
</SPECTRUM>
<SPECTRUM NAME="Spectrometer Zero" INDEX="1">
<TEST NAME="Zero 3025" HI="0.50" LO="-0.50"> <RESULT>-0.076</RESULT> </TEST>
<TEST NAME="Zero 2920" HI="0.50" LO="-0.50"> <RESULT>-0.083</RESULT> </TEST>
</SPECTRUM>
<ROWCOMMENT> 2021_08_12 16_00 BDN2110033 15 35 </ROWCOMMENT>
</RESULT_SET>
</TEST_TYPE>
</DOCUMENT>
I can pull out each piece, but for each of the 14 child nodes (TEST) I need the 4 parents (SPECTRUM) and also other "elder relatives" of size 1 (LASERX, BOARDTEMP, DATE, etc) to be repeated on the corresponding 14 lines of the table.
I'm using this code to pull out the data:
xml_file<-read_xml("SumSnippet2.xml")
group <- xml_find_all(xml_file, ".//SPECTRUM")
group_name <- xml_attr(group, "NAME")
group_idx <- xml_attr(group, "INDEX")
test <- xml_find_all(group, ".//TEST")
test_name <- xml_attr(test, "NAME")
test_hi <- xml_attr(test, "HI")
test_low <- xml_attr(test, "LO")
test_value <- xml_text(test, "RESULT")
df <- data.frame(test_name, test_hi, test_low, test_value)
output$uiTable <- renderTable( df )
But the TEST data is size 14 and the SPECTRUM (group) data is size 4, so if I pull it out in this manner I lose the relationship between spectrum and the test nodes they encompass.
Excel knows exactly what I want to do with the XML:
How do I get my code to do something similar, where parents and "elder relatives" are repeated in the table for the TEST nodes they encompass?
Also, this will get huge -- I'll open dozens of XML documents like this one and merge into a single dataframe. Is xml_find_all() a good choice from a performance standpoint, considering the nodepaths of each document will always be similar?