Hopefully, some users encountered this before... Or @Max has an advice...

I have a fairly simple goal. I am looking to predict a numeric value based on 3 numeric variables. That would be coordinates (lat, lon) and day of the year (1:365). Simple enough, and `caret`

's `knnreg()`

is a perfect solution for my needs. It performs great (on tiny chunks that I feed it) and logically makes more sense for the task (I'd do just that manually if I had a tiny dataset: find closest neighbors and average their values).

One problem is that **I was never able to run it in full**. `knnreg`

executes, but `predict()`

can't handle the amount of data.

- my full dataset is 1.9M rows
- 52K data points are missing and require prediction (final goal)
- a 25% test set would be about 480K rows
- the
`knnreg`

object is`(5 elements, 198.7 Mb)`

I can run it only on tiny sets up of to 5,000 rows.

So, I'm stalling on this first step. Besides the point that I need to do a cross-validation, find a proper `k`

number, and on top of that I need to predict at least 4 more variables based on 3 original predictors.

Is data size my problem? Should I pick a different algorithm for the job?