Not sure if this is an R or RStudio issue, but printing a malformed column runs out of memory

The following runs me out of memory on OSX with both R3.5 and RStudio 1.1.453. I think it is a mal-formed read of a CSV and then something very bad in the string presentation layer. Only try this if you are willing to run out of memory and force-quit R/RStudio.

# from https://public.tableau.com/s/resources?qt-overview_resources=1
mframe = read.table("https://public.tableau.com/s/sites/default/files/media/IHME_GBD_2010_MORTALITY_AGE_SPECIFIC_BY_COUNTRY_1970_2010.csv",
                    # quote = '"',  # MISTAKE of omission, file uses single quote as apostrophe, need to set quote to just double
                    header=TRUE, sep=",", stringsAsFactors=FALSE)
dim(mframe)
unique(mframe$Country.Name)  # Boom! out of memory

(If you get a partial file download failure, go ahead and retry it. I got this effect on a local copy of the file).
Uncommenting the re-definiton of quote seems to fix it. BTW the data is crap, with "death rates per 100,000" in the 300,000 range.

Does it work fine directly in R?

No, it seems to have the same problem in R. It feels like something in the printing/presentation layer- but I wasn't sure how much of that is shared by R and and RStudio. Maybe it is Apple/OSX causing the problems.

There's definitely something off in that data set:

> max(nchar(mframe$Country.Name))
[1] 947304

I suspect it's the attempt to print this very long country name that's causing the issue?

1 Like

The set somewhat works if you uncomment the quote-redifining line. But the original is really tiny. I am not actually working with it as the data quality was also low.