I have a long and complex multilingual XML document where I need to add a further language.
xml2::xml_text it was quite easy to extract all the text into a dataframe, which was then translated manually within a spreadsheet software.
The serious problems arise trying to add the translated items back into the XML file.
I tried the following procedure:
library(xml2); library(stringr); library(magrittr); library(tidyverse) Dokument <- read_xml("Dokument.xml") # XML document Übersetzung <- read_tsv("Übersetzung.tsv",na="NA") # translation tabular file Liste_fr <- xml_find_all(Dokument,'.//*[@lang="fr"]') # retrieve all instances in a given language Liste_it <- Liste_fr # create the nodeset fot the new language based on an existing one xml_set_text(Liste_it,Übersetzung$it) # get the text from the data.frame containing the translations xml_set_attr(Liste_it,"lang","it")
But then all changes made to the Italian
nodeset are propagated also to the French one: the French nodes are lost!
What is the reason for that and how can the two nodesets be "decoupled"?
The other possibility explored, inserting
xml_add_sibling within a
for cycle, seems less practicable, because even nesting many
lists it is hardly possible to create from scratch nodes having all the classes attributes at the right hierarchy level.
I would be thankful for any hint.
Here is how my document looks like:
<?xml version="1.0" encoding="UTF-8"?> <root> <a> <title lang="de">Allgemein</title> <title lang="en">General</title> <title lang="fr">Général</title> <help lang="de"/> <help lang="en"/> <help lang="fr"/> </a> <b> ... </b> </root>