Apologies for what is likely a very trivial question with a trivial solution.
I have an XML file that I'd like to read into a data frame using xml2, but despite a few hours Google searching, I was unsuccessful. This is probably because I'm not at all familiar with XML.
And after all that, I found a 2 liner with XML that mostly does what I'd like (reprex below). What it doesn't do is automatically attempt to infer the column types.
My actual XML has thousands of records and hundreds of variables, so manually specifying the column types would be inconvenient.
My questions:
- is there a simple way to do the same thing with
xml2?
- would it be possible for it to semi-intelligently figure out column types?
I think it'd be horrifically ugly to use XML as below, save it as a CSV text file with write.csv, and then import it back with read_csv, to take advantage of inferring the column types.
I'm guessing the solution will be trivial, yet somehow I've been unable to find it!
Thanks!
library(XML)
xml_doc <-"
<DATA>
<RECORD>
<VAR1>string1</VAR1>
<VAR2>1</VAR2>
<VAR3>2.3</VAR3>
<VAR4>TRUE</VAR4>
</RECORD>
<RECORD>
<VAR1>string2</VAR1>
<VAR2>2</VAR2>
<VAR3>3.4</VAR3>
<VAR4>FALSE</VAR4>
</RECORD>
<RECORD>
<VAR1>string3</VAR1>
<VAR2>3</VAR2>
<VAR3>4.5</VAR3>
<VAR4>TRUE</VAR4>
</RECORD>
</DATA>
"
doc <- xmlParse(xml_doc)
df <- xmlToDataFrame(doc, stringsAsFactors = FALSE)
df
#> VAR1 VAR2 VAR3 VAR4
#> 1 string1 1 2.3 TRUE
#> 2 string2 2 3.4 FALSE
#> 3 string3 3 4.5 TRUE
str(df)
#> 'data.frame': 3 obs. of 4 variables:
#> $ VAR1: chr "string1" "string2" "string3"
#> $ VAR2: chr "1" "2" "3"
#> $ VAR3: chr "2.3" "3.4" "4.5"
#> $ VAR4: chr "TRUE" "FALSE" "TRUE"