Help parsing my XML file

Hello there,

Basically I need to get that information out from the file. I have very few idea about this process so I need your help. I get only until this:

"install.packages('XML')
library(XML)
library(ggplot2)
library(grid)
library(gridExtra)
library(methods)
Enero2003 <- xmlParse(file = "C039_2003/G039_2003_1.xml")
xmltop <-xmlRoot(Enero2003)
class(xmltop)
dfxml <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))"

But I guess I'm doing something wrong. I need the information of the "Precip". The file is basically all the days of the month divided in every 10 minutes.

Could you help me? Thank you in advance.

See the FAQ: How to do a minimal reproducible example reprex for beginners. All that can be made out from the screenshot is that your target is four-levels deep in the xml tree.

1 Like

Thank you so much for your answer.

I should have uploaded the whole file, because as you said this is just a screenshot and it might not be clear. What I am attempting to do, is organize the data into a data frame, but I do not know how to properly adjust my code to obtain that.

is what size? It may not be possible to post it all, and shouldn’t be necessary, since we only need to figure out how to extract to the level of the one variable.

1 Like

The whole file has 71489 rows.

The main root is "mes"
After there is "dia" which is 31 times as it's each day of the month.
Then it's "hora" and "meteoros" which is the time of the day every 10 minutes, so every day it appears 144 times.

I hope this makes sense, if not I'll provide you any information you need.

Thank you.

I recommend switch from trying to use XML library to xml2
use xml2's read_xml and then its as_list, then other toolsets can be used such as purrr 's map family of functions.
I think in your precise case, probably when you turn it to a list, it will be a list of Dia entries. you could maybe therefore head() your list, to maybe something like 10 entries and use dput() to share that to the forum, if you want help exracting the Precip etc.

2 Likes

Hi @nirgrahamuk .

First of all thank you. I was trying what you said to me and I could code until make works the "as_list" and "read_xml" as I show on the screenshot.

The especific code I was using is this one:

  • Enero2003 <- as_list(read_xml("C039_2003/G039_2003_1.xml"))

Any idea about how to introduce head() and dput()?

Thank you!

try

dput(head(Enero2003$mes))
1 Like

Yeah, it looks much better.

Now I only need to take the highlighted data and that would be a huge improvement.

Thank you so much.

A screenshot isn't very helpful to place on this forum.
We can't copy the text from it ...
Can you paste the text ?

1 Like

Sorry, here it goes.

    hora = structure(list(Meteoros = list(Cub.Vto._a_3050cm = list(
        "0.0"), Dir.Med._a_3050cm = list("197.0"), Humedad._a_3050cm = list(
        "80.0"), Irradia.._a_800cm = list("3.0"), Precip.._a_174cm = list(
        "0.0"), Presión._a_60cm = list("800.1"), Sig.Dir._a_3050cm = list(
        "17.0"), Sig.Vel._a_3050cm = list("3.0"), Tem.Sue._a_0cm = list(
        "5.9"), Tem.Aire._a_164cm = list("6.4"), Vel.Max._a_3050cm = list(
        "1.3"), Vel.Med._a_3050cm = list("0.8"))), Hora = "23:50")), Dia = "2003-1-06"))

This is the last piece of the file to don't copy too much.

I think you may have manually edited this dput output, and in doing so made it non-functional...

1 Like

I just copied and pasted from the console, no edited at all.

I'll try another way to paste here and make it functional.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.