The specific point where the killing usually happens is when I use readxl::read_xlsx() on a large file. When the file goes beyond 4000 rows, that process gets killed, though there is some variation in exactly how many rows it takes from one run to another.
I have found so far the following ways of increasing the likelihood of not being killed:
devtools::install_github("krlmlr/ulimit");library(ulimit);memory_limit(size=900);
- Converting the files to csv so I don't have to use the
readxl library
The problem is that these are probabilistic. There ought to be a way to do one or more of the following (in order of importance):
- Programmatically check how close I am to the memory cap so I can attempt to hone in on the exact spot in the code that is at fault.
- A reliable way to make my scripts simply act like they are running on a computer with, say, 0.9G of RAM and beyond that... I guess use swap?
I tried manipulating the skip and n_rows arguments to force read_xlsx to read the file in chunks, but it gets killed on the first chunk so presumably it reads in the whole file each time regardless of which rows it returns.
The practical workaround I'm using for now is encouraging people to use csv files if they can, or to limit the size of the xlsx files they try to use.