Trouble converting nested XML to dataframe using xml2

Good afternoon,

This nested XML is giving me a hard time.:

<DOCUMENT>
 <TEST_TYPE NAME="Test, no warmup, Mar 13, 2019 Build 006">
  <RESULT_SET>
   <DATE>2021:8:10 11:22:40 (GMT-05:00)</DATE>
   <OPERATOR>User</OPERATOR>
   <INSTRUMENT SERIALNUM="ABC1234">ERROR</INSTRUMENT>
   <INSTRUMENT_EXTRA>
    <BOARDTEMP LABEL="Board Temp:">58.042717</BOARDTEMP>
    <LASERX LABEL="Laser X:">5.539117</LASERX>
    <LASERY LABEL="Laser Y:">5.370719</LASERY>
    <LASERR LABEL="Laser R:">4.868718</LASERR>
   </INSTRUMENT_EXTRA>
   <SPECTRUM NAME="Resolution Half Small" INDEX="1">
    <TEST NAME="Loc 2176" HI="2176.334" LO="2176.234"> <RESULT>2176.794</RESULT> </TEST>
    <TEST NAME="Width 2176" HI="0.60" LO=""> <RESULT>0.386</RESULT> </TEST>
    <TEST NAME="Loc 2179" HI="2179.822" LO="2179.722"> <RESULT>2179.254</RESULT> </TEST>
    <TEST NAME="Width 2179" HI="0.60" LO=""> <RESULT>0.354</RESULT> </TEST>
   </SPECTRUM>
   <SPECTRUM NAME="Resolution Half Small" INDEX="2">
    <TEST NAME="Loc 2176" HI="2176.334" LO="2176.234"> <RESULT>2176.794</RESULT> </TEST>
    <TEST NAME="Width 2176" HI="0.60" LO=""> <RESULT>0.386</RESULT> </TEST>
    <TEST NAME="Loc 2179" HI="2179.822" LO="2179.722"> <RESULT>2179.254</RESULT> </TEST>
    <TEST NAME="Width 2179" HI="0.60" LO=""> <RESULT>0.354</RESULT> </TEST>
   </SPECTRUM>
   <SPECTRUM NAME="Wavelength Precision Half Small" INDEX="1">
    <TEST NAME="STD 2176" HI="0.0010" LO=""> <RESULT>0.11426</RESULT> </TEST>
    <TEST NAME="STD 2179" HI="0.0010" LO=""> <RESULT>0.42696</RESULT> </TEST>
    <TEST NAME="Min 2176" HI="" LO=""> <RESULT>2176.71582</RESULT> </TEST>
    <TEST NAME="Max 2176" HI="" LO=""> <RESULT>2177.05127</RESULT> </TEST>
   </SPECTRUM>
   <SPECTRUM NAME="Spectrometer Zero" INDEX="1">
    <TEST NAME="Zero 3025" HI="0.50" LO="-0.50"> <RESULT>-0.076</RESULT> </TEST>
    <TEST NAME="Zero 2920" HI="0.50" LO="-0.50"> <RESULT>-0.083</RESULT> </TEST>
   </SPECTRUM>
   <ROWCOMMENT> 2021_08_12 16_00 BDN2110033 15 35 </ROWCOMMENT>
  </RESULT_SET>
 </TEST_TYPE>
</DOCUMENT>

I can pull out each piece, but for each of the 14 child nodes (TEST) I need the 4 parents (SPECTRUM) and also other "elder relatives" of size 1 (LASERX, BOARDTEMP, DATE, etc) to be repeated on the corresponding 14 lines of the table.

I'm using this code to pull out the data:

  xml_file<-read_xml("SumSnippet2.xml")
  
  group <- xml_find_all(xml_file, ".//SPECTRUM")
  group_name <- xml_attr(group, "NAME")
  group_idx <- xml_attr(group, "INDEX")
  
  test <- xml_find_all(group, ".//TEST")
  test_name <- xml_attr(test, "NAME")
  test_hi <- xml_attr(test, "HI")
  test_low <- xml_attr(test, "LO")
  test_value <- xml_text(test, "RESULT")
  
  df <- data.frame(test_name, test_hi, test_low, test_value)

  output$uiTable <- renderTable( df )

But the TEST data is size 14 and the SPECTRUM (group) data is size 4, so if I pull it out in this manner I lose the relationship between spectrum and the test nodes they encompass.

Excel knows exactly what I want to do with the XML:

How do I get my code to do something similar, where parents and "elder relatives" are repeated in the table for the TEST nodes they encompass?

Also, this will get huge -- I'll open dozens of XML documents like this one and merge into a single dataframe. Is xml_find_all() a good choice from a performance standpoint, considering the nodepaths of each document will always be similar?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.