Hi everyone,
I'm facing a complex XML file, and have found some awesome resources and examples from, @richardo , @cderv , @danton , to get started, but am running into difficulties with returning multiple node attributes, from various levels, and subsequent child elements. In addition, I need the output in a tidy data frame format.
My code below builds off of @richardo's post, Generate a data frame from many xml files , and from there, I'm trying to incorporate @danton's xmltools package for element values.
Please see the attached screenshot of the output I'm trying to get.
I've included a reprex, please see below, and am looking for the following,
# need attributes from
# /Package/PackageBody/InvestmentVehicle
# /Package/PackageBody/InvestmentVehicle/FundShareClass
# /Package/PackageBody/InvestmentVehicle/PortfolioList/Portfolio
# /Package/PackageBody/InvestmentVehicle/PortfolioList/Portfolio/Holding/HoldingDetail
# need text/values from
# /Package/PackageBody/InvestmentVehicle/FundShareClass/Operation/ShareClassBasics/Name
# /Package/PackageBody/InvestmentVehicle/FundShareClass/Operation/ShareClassBasics/LegalType
# /Package/PackageBody/InvestmentVehicle/PortfolioList/Portfolio/Holding/HoldingDetail/Symbol
# including all through
# /Package/PackageBody/InvestmentVehicle/PortfolioList/Portfolio/Holding/HoldingDetail/FirstBoughtDate
# do not need
# /Package/PackageBody/InvestmentVehicle/PortfolioList/Portfolio/PortfolioSummary
# /Package/PackageBody/InvestmentVehicle/PortfolioList/Portfolio/PortfolioBreakdown
Essentially, I'm not able to return the element values for the HoldingDetails, while also repeating the InvestmentVehicle, FundShareClass, and Portfolio attributes for respective HoldingDetail rows. I also need the InvestmentVehicle child elements of Name and LegalType to repeat for each respective HoldingDetail row.
library(xmltools)
library(xml2)
library(dplyr)
library(tidyr)
library(purrr)
library(xmltools)
install.packages("reprex")
###====XML====###
text <- '<Package>
<PackageHeader>
<PackageName>AllHoldings</PackageName>
<Universe>FO</Universe>
<AsOfDate>2018-11-09</AsOfDate>
<Version>2.5</Version>
</PackageHeader>
<PackageBody>
<InvestmentVehicle _Id="F00000ANDY">
<FundShareClass _Id="F00000ANDY">
<Operation>
<ShareClassBasics>
<Name>Andy Heritage Fund A</Name>
<LegalType _Id="FO">FO</LegalType>
</ShareClassBasics>
</Operation>
</FundShareClass>
<PortfolioList>
<Portfolio xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" _CurrencyId="EUR" _ExternalId="ANDY01111111">
<PortfolioSummary>
<Date>2015-01-31</Date>
<HoldingAggregate _SalePosition="L">
<NumberOfHolding>4</NumberOfHolding>
<NumberOfStockHolding>2</NumberOfStockHolding>
<NumberOfBondHolding>0</NumberOfBondHolding>
<TotalMarketValue>1000000</TotalMarketValue>
</HoldingAggregate>
<HoldingAggregate _SalePosition="S">
<NumberOfHolding>1</NumberOfHolding>
<NumberOfStockHolding>0</NumberOfStockHolding>
<NumberOfBondHolding>0</NumberOfBondHolding>
<TotalMarketValue>10</TotalMarketValue>
</HoldingAggregate>
</PortfolioSummary>
<PortfolioBreakdown _SalePosition="L">
<AssetAllocation>
<BreakdownValue Type="1">50.00</BreakdownValue>
<BreakdownValue Type="2">45.00</BreakdownValue>
<BreakdownValue Type="3">0.00</BreakdownValue>
<BreakdownValue Type="4">0.00</BreakdownValue>
<BreakdownValue Type="5">0.00</BreakdownValue>
<BreakdownValue Type="6">0.00</BreakdownValue>
<BreakdownValue Type="7">5.00</BreakdownValue>
<BreakdownValue Type="8">0.00</BreakdownValue>
</AssetAllocation>
</PortfolioBreakdown>
<Holding>
<HoldingDetail _Id="XYZ123456">
<Symbol>ANDY</Symbol>
<Country _Id="USA">United States</Country>
<Currency _Id="USD">US Dollar</Currency>
<SecurityName>ANDY Co</SecurityName>
<LegalType>E</LegalType>
<Weighting>7.89</Weighting>
<NumberOfShare>1000</NumberOfShare>
<ShareChange>0</ShareChange>
<MarketValue>500000</MarketValue>
<Sector>10</Sector>
<HoldingYTDReturn>10</HoldingYTDReturn>
<Region>1</Region>
<StyleBox>1</StyleBox>
<FirstBoughtDate>2010-01-31</FirstBoughtDate>
</HoldingDetail>
<HoldingDetail _Id="NMO123456">
<Symbol>YDNA</Symbol>
<Country _Id="CHE">Switzerland</Country>
<Currency _Id="CHF">Swiss Franc</Currency>
<SecurityName>Andy Shares</SecurityName>
<LegalType>E</LegalType>
<Weighting>1.23</Weighting>
<NumberOfShare>10000</NumberOfShare>
<ShareChange>0</ShareChange>
<MarketValue>50000</MarketValue>
<Sector>1</Sector>
<Region>9</Region>
<StyleBox>4</StyleBox>
<FirstBoughtDate>2010-10-31</FirstBoughtDate>
</HoldingDetail>
<HoldingDetail>
<SecurityName>Us Dollar Spot</SecurityName>
<LegalType>CH</LegalType>
<Weighting>0.5</Weighting>
<NumberOfShare>100</NumberOfShare>
<ShareChange>10</ShareChange>
<MarketValue>100000</MarketValue>
</HoldingDetail>
<HoldingDetail>
<SecurityName>British Pound Spot</SecurityName>
<LegalType>CH</LegalType>
<Weighting>0.75</Weighting>
<NumberOfShare>200</NumberOfShare>
<ShareChange>20</ShareChange>
<MarketValue>50000</MarketValue>
</HoldingDetail>
</Holding>
</Portfolio>
</PortfolioList>
</InvestmentVehicle>
<InvestmentVehicle _Id="F00000YDNA">
<FundShareClass _Id="F00000YDNA">
<Operation>
<ShareClassBasics>
<Name>Andy Heritage Fund B</Name>
<LegalType _Id="FO">FO</LegalType>
</ShareClassBasics>
</Operation>
</FundShareClass>
<PortfolioList>
<Portfolio xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" _CurrencyId="EUR" _ExternalId="ANDY02222222">
<PortfolioSummary>
<Date>2015-06-30</Date>
<HoldingAggregate _SalePosition="L">
<NumberOfHolding>2</NumberOfHolding>
<NumberOfStockHolding>10</NumberOfStockHolding>
<NumberOfBondHolding>0</NumberOfBondHolding>
<TotalMarketValue>700000</TotalMarketValue>
</HoldingAggregate>
<HoldingAggregate _SalePosition="S">
<NumberOfHolding>1</NumberOfHolding>
<NumberOfStockHolding>0</NumberOfStockHolding>
<NumberOfBondHolding>0</NumberOfBondHolding>
<TotalMarketValue>100</TotalMarketValue>
</HoldingAggregate>
</PortfolioSummary>
<PortfolioBreakdown _SalePosition="L">
<AssetAllocation>
<BreakdownValue Type="1">53.00</BreakdownValue>
<BreakdownValue Type="2">40.00</BreakdownValue>
<BreakdownValue Type="3">0.00</BreakdownValue>
<BreakdownValue Type="4">0.00</BreakdownValue>
<BreakdownValue Type="5">0.00</BreakdownValue>
<BreakdownValue Type="6">0.00</BreakdownValue>
<BreakdownValue Type="7">7.00</BreakdownValue>
<BreakdownValue Type="8">0.00</BreakdownValue>
</AssetAllocation>
</PortfolioBreakdown>
<Holding>
<HoldingDetail _Id="DEF0123456">
<Symbol>DEF</Symbol>
<Country _Id="USA">United States</Country>
<Currency _Id="USD">US Dollar</Currency>
<SecurityName>DEF Inc</SecurityName>
<LegalType>E</LegalType>
<Weighting>12.0</Weighting>
<NumberOfShare>50000</NumberOfShare>
<ShareChange>0</ShareChange>
<MarketValue>1000000</MarketValue>
<Sector>10</Sector>
<HoldingYTDReturn>25.00</HoldingYTDReturn>
<Region>7</Region>
<StyleBox>3</StyleBox>
<FirstBoughtDate>2015-09-30</FirstBoughtDate>
</HoldingDetail>
<HoldingDetail _Id="GHI0123456">
<Symbol>GFRT</Symbol>
<Country _Id="CHE">Switzerland</Country>
<Currency _Id="CHF">Swiss Franc</Currency>
<SecurityName>GFRT Shares</SecurityName>
<LegalType>E</LegalType>
<Weighting>5.87</Weighting>
<NumberOfShare>100</NumberOfShare>
<ShareChange>0</ShareChange>
<MarketValue>900000</MarketValue>
<Sector>3</Sector>
<Region>2</Region>
<StyleBox>1</StyleBox>
<FirstBoughtDate>2015-12-31</FirstBoughtDate>
</HoldingDetail>
</Holding>
</Portfolio>
</PortfolioList>
</InvestmentVehicle>
</PackageBody>
</Package>'
###====XML====###
###=====================================progress===============================###
# xml2
docReprex <- text %>%
xml2::read_xml()
df <- xml2::xml_find_all(docReprex, "//InvestmentVehicle") %>%
map_dfr(~ {
parent <- xml_attrs(.) %>% enframe() %>% spread(name, value)
kids <- xml_children(.) %>% map_dfr(~as.list(xml_attrs(.)))
grandkids <- xml_children(xml_children(.)) %>% map_dfr(~as.list(xml_attrs(.)))
greatgrandkids <- xml_children(xml_children(xml_children(.))) %>% map_dfr(~as.list(xml_attrs(.)))
cbind.data.frame(parent, kids, grandkids, greatgrandkids) %>% set_tidy_names() %>% as_tibble()
})
# xmltools
# get all xpaths to parents of parent node
termParentReprex <- docReprex %>%
xml_get_paths(only_terminal_parent = TRUE)
termNodeReprex <- docReprex %>%
xml_get_paths()
termXPathsReprex <- termParentReprex %>%
unlist() %>%
unique()
dfReprex1 <- lapply(termXPathsReprex, xml_to_df, file = text, is_xml = FALSE, dig = FALSE) %>%
dplyr::bind_cols()
#> Error in setnames(x, value): Can't assign 1 names to a 2 column data.table
###=====================================progress===============================###
Many thanks for your time, and feel free to reach out if I can send along additional details.
Andrew