Sounds like a fun project.
One thing you should know is that R has a native factor data type. You shouldn't have to make dummy variables yourself. Look at this output and see how the dummies are created automatically. model.matrix(Sepal.Length ~ ., data = iris)
Lasso or elastic net are great methods for sparse data. Especially for a 1st pass modeling because they are fast to fit and easy to understand. Using caret::train, use method = "glmnet" argument.
Finally, if you are imputing missing values with a multivariate imputation method, make sure that you don't use the response variable as an imputation predictor.