As of the most recent release (1.0.0/0.2.2) of tidyselect/vctrs, when fields are missing from an XML document, if the nodes lacking said fields appear first, the data is able to be read, but if they appear after nodes that have the fields, the code results in an error. (This was not an issue in prior versions.)
In my specific instance, all fields are necessary and will be at least partially populated; is there any advice for reading in such data independent of location of the missing fields?
library(dplyr, warn.conflicts = FALSE)
# Also uses tidyr, xml2
# In this order, it successfully reads the data:
xmlstring_ok <-
'<?xml version="1.0" encoding="utf-8"?>
<Study StudyName="Sandbox" StudyAlias="Sahara">
<Procedure>
<Series>
<InternalSeriesID></InternalSeriesID>
<WFStep>Not Done</WFStep>
</Series>
</Procedure>
<Procedure>
<Series>
<InternalSeriesID>104646</InternalSeriesID>
<Technician>Tech_Test, Tech Test</Technician>
<Equipment>
<EquipmentSerial>123</EquipmentSerial>
<EquipmentType>Fundus Camera</EquipmentType>
<EquipmentModel>50DX</EquipmentModel>
<EquipmentManufacturer>Topcon Corporation</EquipmentManufacturer>
</Equipment>
<StudyDate>2019-09-09</StudyDate>
<WFStep>Verify</WFStep>
</Series>
</Procedure>
</Study>'
indat <- xml2::as_list(xml2::read_xml(xmlstring_ok))
work <- tidyr::tibble(inv = indat)
out <- work %>%
tidyr::unnest_longer(inv, indices_include = FALSE) %>%
tidyr::unnest_wider(inv) %>%
tidyr::unnest_wider(Series) %>%
tidyr::unnest_wider(Equipment) %>%
dplyr::select(-...1) %>%
apply(MARGIN = c(1,2), FUN = unlist) %>%
as.data.frame(stringsAsFactors = FALSE)
#> New names:
#> * `` -> ...1
out
#> WFStep InternalSeriesID Technician EquipmentSerial EquipmentType
#> 1 Not Done <NA> <NA> <NA> <NA>
#> 2 Verify 104646 Tech_Test, Tech Test 123 Fundus Camera
#> EquipmentModel EquipmentManufacturer StudyDate
#> 1 <NA> <NA> <NA>
#> 2 50DX Topcon Corporation 2019-09-09
# In this order, it breaks:
xmlstring_bad <-
'<?xml version="1.0" encoding="utf-8"?>
<Study StudyName="Sandbox" StudyAlias="Sahara">
<Procedure>
<Series>
<InternalSeriesID>104646</InternalSeriesID>
<Technician>Tech_Test, Tech Test</Technician>
<Equipment>
<EquipmentSerial>123</EquipmentSerial>
<EquipmentType>Fundus Camera</EquipmentType>
<EquipmentModel>50DX</EquipmentModel>
<EquipmentManufacturer>Topcon Corporation</EquipmentManufacturer>
</Equipment>
<StudyDate>2019-09-09</StudyDate>
<WFStep>Verify</WFStep>
</Series>
</Procedure>
<Procedure>
<Series>
<InternalSeriesID></InternalSeriesID>
<WFStep>Not Done</WFStep>
</Series>
</Procedure>
</Study>'
indat <- xml2::as_list(xml2::read_xml(xmlstring_bad))
work <- tidyr::tibble(inv = indat)
out <- work %>%
tidyr::unnest_longer(inv, indices_include = FALSE) %>%
tidyr::unnest_wider(inv) %>%
tidyr::unnest_wider(Series) %>%
tidyr::unnest_wider(Equipment)
#> New names:
#> * `` -> ...1
#> Error: Can't cast `Equipment$...1` <logical> to `Equipment$...1` <vctrs_unspecified>.
# dplyr::select(-...1) %>%
# apply(MARGIN = c(1,2), FUN = unlist) %>%
# as.data.frame(stringsAsFactors = FALSE)
Created on 2020-01-28 by the reprex package (v0.3.0)