For a machine learning project, I am trying to predict the rental price and costs of real estate.
For this, I use rather a lot of dummy variables to capture differences based on geography, which makes the data rather sparse. Further, for the characteristics of the properties (number of bedrooms, bathrooms, whether there is a garden, ...) there is a lot of missing data.
Are there any good machine learning algorithms/packages in R that can handle this type of data?
One thing you should know is that R has a native factor data type. You shouldn't have to make dummy variables yourself. Look at this output and see how the dummies are created automatically. model.matrix(Sepal.Length ~ ., data = iris)
Lasso or elastic net are great methods for sparse data. Especially for a 1st pass modeling because they are fast to fit and easy to understand. Using caret::train, use method = "glmnet" argument.
Finally, if you are imputing missing values with a multivariate imputation method, make sure that you don't use the response variable as an imputation predictor.