Datawrangling Semi structured XML to Dataframe

Hi other R enthusiasts.

I've been trying to figure out how to tidy up semi-structured XML's to pretty nxm tables. I've been struggeling to find good tutorials and i've found the documentation slightly advanced since i'm not used to dealing with these kinds of problems on a daily basis.

I have a very simplified xml structure below that I would like to tidy up into a tabular format.

Starting point - XML (For simplicity namespacing definitions is left out in this example)

<x1:root>
      <x1:customers>
            <x1:customer>
                  <x1:name>Customer1 </x1:name>
                  <x1:address>
                        <x1:streetname>SomeStreet</x1:streetname>
                        <x1:number>1</x1:number>
                  </x1:address>
                  <x1:zip>10069</x1:zip>
            </x1:customer>
            <x1:customer>
                  <x1:name>Customer2</x1:name>
                  <x1:address>
                        <x1:streetname>SomeStreetWithoutZip&Nr</x1:streetname>
                  </x1:address>
            </x1:customer>
      </x1:customers>
</x1:root>

Wanted structure (without ns)

(cols x rows)

Name            | Streetname           | Number   | Zip 
_________________________________________________________
Customer 1        SomeStreet               1        10069
Customer 2        SomeStreetWithoutZip     NULL       NULL

I've tried the XML's xmlToDataFrame but it's too simple. Can anyone point me in the right direction :slight_smile: ?

See this stackoverflow thread

Thanks, I've made some progress on it now :slight_smile: .

For anyone wondering how it was solved (slightly lazy):

For each subtable(in the dataset above <n1:Customers>

  1. Use xmlToDataFrame as the primary table
  2. Use xmlToDataFrame (or by using Xpath, see link provided by technocrat) create secondary table(s) for all sublevels that is concatenated in table 1. Go as deep as you need to go.
  3. Create a unique key by concatenating all columns in the secondary table(s)
  4. Join the primary and secoundary tables together using the unique key created in previous step. Do this recursively from bottom to the top
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.