Help parsing my XML file

Dom90 · November 21, 2021, 10:00am

Hello there,

Basically I need to get that information out from the file. I have very few idea about this process so I need your help. I get only until this:

"install.packages('XML')
library(XML)
library(ggplot2)
library(grid)
library(gridExtra)
library(methods)
Enero2003 <- xmlParse(file = "C039_2003/G039_2003_1.xml")
xmltop <-xmlRoot(Enero2003)
class(xmltop)
dfxml <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))"

But I guess I'm doing something wrong. I need the information of the "Precip". The file is basically all the days of the month divided in every 10 minutes.

Could you help me? Thank you in advance.

technocrat · November 21, 2021, 9:38pm

See the FAQ: How to do a minimal reproducible example reprex for beginners. All that can be made out from the screenshot is that your target is four-levels deep in the xml tree.

Dom90 · November 22, 2021, 9:13am

Thank you so much for your answer.

I should have uploaded the whole file, because as you said this is just a screenshot and it might not be clear. What I am attempting to do, is organize the data into a data frame, but I do not know how to properly adjust my code to obtain that.

technocrat · November 22, 2021, 9:29am

is what size? It may not be possible to post it all, and shouldn’t be necessary, since we only need to figure out how to extract to the level of the one variable.

Dom90 · November 22, 2021, 9:56am

The whole file has 71489 rows.

The main root is "mes"
After there is "dia" which is 31 times as it's each day of the month.
Then it's "hora" and "meteoros" which is the time of the day every 10 minutes, so every day it appears 144 times.

I hope this makes sense, if not I'll provide you any information you need.

Thank you.

nirgrahamuk · November 22, 2021, 11:11am

I recommend switch from trying to use XML library to xml2
use xml2's read_xml and then its as_list, then other toolsets can be used such as purrr 's map family of functions.
I think in your precise case, probably when you turn it to a list, it will be a list of Dia entries. you could maybe therefore head() your list, to maybe something like 10 entries and use dput() to share that to the forum, if you want help exracting the Precip etc.

Dom90 · November 23, 2021, 12:58pm

Hi @nirgrahamuk .

First of all thank you. I was trying what you said to me and I could code until make works the "as_list" and "read_xml" as I show on the screenshot.

The especific code I was using is this one:

Enero2003 <- as_list(read_xml("C039_2003/G039_2003_1.xml"))

Any idea about how to introduce head() and dput()?

Thank you!

nirgrahamuk · November 23, 2021, 1:09pm

try

dput(head(Enero2003$mes))

Dom90 · November 23, 2021, 1:39pm

Yeah, it looks much better.

Now I only need to take the highlighted data and that would be a huge improvement.

Thank you so much.

nirgrahamuk · November 23, 2021, 1:40pm

A screenshot isn't very helpful to place on this forum.
We can't copy the text from it ...
Can you paste the text ?

Dom90 · November 23, 2021, 1:54pm

Sorry, here it goes.

    hora = structure(list(Meteoros = list(Cub.Vto._a_3050cm = list(
        "0.0"), Dir.Med._a_3050cm = list("197.0"), Humedad._a_3050cm = list(
        "80.0"), Irradia.._a_800cm = list("3.0"), Precip.._a_174cm = list(
        "0.0"), Presión._a_60cm = list("800.1"), Sig.Dir._a_3050cm = list(
        "17.0"), Sig.Vel._a_3050cm = list("3.0"), Tem.Sue._a_0cm = list(
        "5.9"), Tem.Aire._a_164cm = list("6.4"), Vel.Max._a_3050cm = list(
        "1.3"), Vel.Med._a_3050cm = list("0.8"))), Hora = "23:50")), Dia = "2003-1-06"))

This is the last piece of the file to don't copy too much.

nirgrahamuk · November 23, 2021, 1:58pm

I think you may have manually edited this dput output, and in doing so made it non-functional...

Dom90 · November 23, 2021, 2:20pm

I just copied and pasted from the console, no edited at all.

I'll try another way to paste here and make it functional.

system · December 14, 2021, 2:20pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.