I have a long and complex multilingual XML document where I need to add a further language.
With xml2::xml_text
it was quite easy to extract all the text into a dataframe, which was then translated manually within a spreadsheet software.
The serious problems arise trying to add the translated items back into the XML file.
I tried the following procedure:
library(xml2); library(stringr); library(magrittr); library(tidyverse)
Dokument <- read_xml("Dokument.xml") # XML document
Übersetzung <- read_tsv("Übersetzung.tsv",na="NA") # translation tabular file
Liste_fr <- xml_find_all(Dokument,'.//*[@lang="fr"]') # retrieve all instances in a given language
Liste_it <- Liste_fr # create the nodeset fot the new language based on an existing one
xml_set_text(Liste_it,Übersetzung$it) # get the text from the data.frame containing the translations
xml_set_attr(Liste_it,"lang","it")
But then all changes made to the Italian nodeset
are propagated also to the French one: the French nodes are lost!
What is the reason for that and how can the two nodesets be "decoupled"?
The other possibility explored, inserting xml_add_sibling
within a for
cycle, seems less practicable, because even nesting many list
s it is hardly possible to create from scratch nodes having all the classes attributes at the right hierarchy level.
I would be thankful for any hint.
Here is how my document looks like:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<a>
<title lang="de">Allgemein</title>
<title lang="en">General</title>
<title lang="fr">Général</title>
<help lang="de"/>
<help lang="en"/>
<help lang="fr"/>
</a>
<b>
...
</b>
</root>
Thanks,
Giacomo