Not sure if this is an R or RStudio issue, but printing a malformed column runs out of memory


#1

The following runs me out of memory on OSX with both R3.5 and RStudio 1.1.453. I think it is a mal-formed read of a CSV and then something very bad in the string presentation layer. Only try this if you are willing to run out of memory and force-quit R/RStudio.

# from https://public.tableau.com/s/resources?qt-overview_resources=1
mframe = read.table("https://public.tableau.com/s/sites/default/files/media/IHME_GBD_2010_MORTALITY_AGE_SPECIFIC_BY_COUNTRY_1970_2010.csv",
                    # quote = '"',  # MISTAKE of omission, file uses single quote as apostrophe, need to set quote to just double
                    header=TRUE, sep=",", stringsAsFactors=FALSE)
dim(mframe)
unique(mframe$Country.Name)  # Boom! out of memory

(If you get a partial file download failure, go ahead and retry it. I got this effect on a local copy of the file).
Uncommenting the re-definiton of quote seems to fix it. BTW the data is crap, with "death rates per 100,000" in the 300,000 range.


#2

Does it work fine directly in R?


#3

No, it seems to have the same problem in R. It feels like something in the printing/presentation layer- but I wasn't sure how much of that is shared by R and and RStudio. Maybe it is Apple/OSX causing the problems.


#4

There's definitely something off in that data set:

> max(nchar(mframe$Country.Name))
[1] 947304

I suspect it's the attempt to print this very long country name that's causing the issue?


#5

The set somewhat works if you uncomment the quote-redifining line. But the original is really tiny. I am not actually working with it as the data quality was also low.