Partial translation of XML file with `xml2` overwrites existing nodes

Zack83 · October 30, 2020, 10:20am

I have a long and complex multilingual XML document where I need to add a further language.

With xml2::xml_text it was quite easy to extract all the text into a dataframe, which was then translated manually within a spreadsheet software.

The serious problems arise trying to add the translated items back into the XML file.

I tried the following procedure:

library(xml2); library(stringr); library(magrittr); library(tidyverse)
Dokument <- read_xml("Dokument.xml")                  # XML document
Übersetzung <- read_tsv("Übersetzung.tsv",na="NA")    # translation tabular file
Liste_fr <- xml_find_all(Dokument,'.//*[@lang="fr"]') # retrieve all instances in a given language
Liste_it <- Liste_fr                                  # create the nodeset fot the new language based on an existing one
xml_set_text(Liste_it,Übersetzung$it)                 # get the text from the data.frame containing the translations
xml_set_attr(Liste_it,"lang","it")

But then all changes made to the Italian nodeset are propagated also to the French one: the French nodes are lost!
What is the reason for that and how can the two nodesets be "decoupled"?

The other possibility explored, inserting xml_add_sibling within a for cycle, seems less practicable, because even nesting many lists it is hardly possible to create from scratch nodes having all the classes attributes at the right hierarchy level.

I would be thankful for any hint.

Here is how my document looks like:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<a>
    <title lang="de">Allgemein</title>
    <title lang="en">General</title>
    <title lang="fr">Général</title>
    <help lang="de"/>
    <help lang="en"/>
    <help lang="fr"/>
</a>
<b>
    ...
</b>
</root>

Thanks,

Giacomo

Zack83 · October 30, 2020, 5:18pm

OK, I circumvented the problem with a dirty trick: i created the new language column before loading the source document and creating the columns for the existing languages.

Liste_it <- xml_find_all(read_xml(Dateiname),'.//*[@lang="fr"]') # not load the document
Dokument <- read_xml(Dateiname)                                  # load the document
Liste_de <- xml_find_all(Dokument,'.//*[@lang="de"]')            # create the other language files
Liste_en <- xml_find_all(Dokument,'.//*[@lang="en"]')
Liste_fr <- xml_find_all(Dokument,'.//*[@lang="fr"]')

At the end I indeed used xml_add_sibling:

for(.k in 1:length(Liste_fr)) xml_add_sibling( Liste_fr[[.k]],Liste_it[[.k]] )

If a more elegant solution is known, I'd be happy to know, otherwise I can live with the current one.

system · November 20, 2020, 5:18pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.