As an alternative to disk.frame, with which I have no experience, I offer some general observations and an alternative toolchain.
Let's start with the data. The pfl_data object is a list in which the data of real interest are tibbles deeply embedded several indices down.
> head(pfl_data[[1]][2][1][[1]],1)
# A tibble: 1 x 6
V1 V2 V3 V4 V5 V6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 250 1000 0 49.7 154. 173.
> head(pfl_data[[1]][3][1][[1]],1)
# A tibble: 1 x 9
V1 V2 V3 V4 V5 V6 V7 V8 V9
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 250 1000 0 -3797. 190. 294. 313. 323. 323.
> head(pfl_data[[1]][4][1][[1]],1)
# A tibble: 1 x 11
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 250 1000 0 -3673. 314 417 436. 446. 446. 453. 454.
Each embedded tibble appears to have 1,001 rows and a variable number of columns. Per the issue description, only two columns are needed from each data frame. Because the list fits comfortably in memory
> object.size(pfl_data)
1553736 bytes
and, therefore, the final object, which is a subset, should also. However, presumably because of a proliferation of intermediate objects, it does not.
Although we grow to understand the R philosophy of lazy evaluation, lazily bringing data into available memory is not at the top of our minds until we bump up against the constraints imposed by dynamic memory, operating system limits on per process access to it, failure of the OS to release it or some combination.
Accordingly, I would leave pfl_data out of RAM and extract only the pieces needed, since most of it is pure surplusage, to accomplish the boiling down. I would also do the join out-of-memory. The obvious tool is an SQL database.
For this data set, SqlLite is probably adequate and MySQL/MariaDB, Postgres or another relational database manager is definitely adequate by a very large margin.
The {dbplyr} package allows you to work with the data stored out-of-ram as if it were in memory using the same commands as for {dplyr} for these basic operations of select and join.
This approach exemplifies a helpful way of approaching R, the interaction of three objects— an existing object, x , a desired object,y , and a function, f, that will return a value of y given x as an argument. In other words, school algebra— f(x) = y. Any of the objects can be composites.
For this case, the objects are readily identifiable. pfl_data is x, a list of 20-70 tibbles, and y is the object desired to assume the role of x for further analysis. f will be a composite function, as follows
g g(x,y,z) to query tibbles x & y and join by key z
h h(g(x,y,z) to perform g and save it back to some object in or out of memory.