Can anyone explain why the default guess_max = min(1000, n_max)
is a good idea for read_xlsx? Doesn’t this trade off a few milliseconds of time to guess an answer that is likely wrong on larger files? Isn’t this an example of Knuth’s “premature optimization”?
I received hundreds of errors in trying to read an Excel file with ~88,000 lines because the default guess_max is so small. I then needed to find out how many lines are in the file, or just pick a very large number – perhaps 2^20 since that’s an Excel limit. Why can’t I specify guess_max = “n_max” without specifying a number? Why isn’t guess_max = “n_max” the default given the speed of modern computers?
I received no errors in reading the 88,000-line Excel file with guess_max = 2^20, and that only took a few seconds.