Apologies for what is likely a very trivial question with a trivial solution.
I have an XML file that I'd like to read into a data frame using
xml2, but despite a few hours Google searching, I was unsuccessful. This is probably because I'm not at all familiar with XML.
And after all that, I found a 2 liner with
XML that mostly does what I'd like (reprex below). What it doesn't do is automatically attempt to infer the column types.
My actual XML has thousands of records and hundreds of variables, so manually specifying the column types would be inconvenient.
- is there a simple way to do the same thing with
- would it be possible for it to semi-intelligently figure out column types?
I think it'd be horrifically ugly to use
XML as below, save it as a CSV text file with
write.csv, and then import it back with
read_csv, to take advantage of inferring the column types.
I'm guessing the solution will be trivial, yet somehow I've been unable to find it!
library(XML) xml_doc <-" <DATA> <RECORD> <VAR1>string1</VAR1> <VAR2>1</VAR2> <VAR3>2.3</VAR3> <VAR4>TRUE</VAR4> </RECORD> <RECORD> <VAR1>string2</VAR1> <VAR2>2</VAR2> <VAR3>3.4</VAR3> <VAR4>FALSE</VAR4> </RECORD> <RECORD> <VAR1>string3</VAR1> <VAR2>3</VAR2> <VAR3>4.5</VAR3> <VAR4>TRUE</VAR4> </RECORD> </DATA> " doc <- xmlParse(xml_doc) df <- xmlToDataFrame(doc, stringsAsFactors = FALSE) df #> VAR1 VAR2 VAR3 VAR4 #> 1 string1 1 2.3 TRUE #> 2 string2 2 3.4 FALSE #> 3 string3 3 4.5 TRUE str(df) #> 'data.frame': 3 obs. of 4 variables: #> $ VAR1: chr "string1" "string2" "string3" #> $ VAR2: chr "1" "2" "3" #> $ VAR3: chr "2.3" "3.4" "4.5" #> $ VAR4: chr "TRUE" "FALSE" "TRUE"