I am happily using read_csv_chunked to filter on the fly a subset of data from a 9GiB file.
The relevant callback is a filter on matching value of a couple of columns (one of which from a mutate).
I have "randomly" chosen the value for chunk_size to 20000.
Is there some heuristics/rationale for setting chunk_size to a value that minimizes the time needed to read the data?
My guess is that it partially depends on the "size" of each row and the applied filter complexity (memory requirements and CPU computation) but I am curious to hear from other users...