R TensorFlow tfestimators - SVM and random forest how to?

masonjames · September 24, 2018, 11:37am

The R package tfestimators (https://tensorflow.rstudio.com/tfestimators/) lists several canned estimators currently available:

linear_regressor() Linear regressor model.

linear_classifier() Linear classifier model.

dnn_regressor() DNN Regression.

dnn_classifier() DNN Classification.

dnn_linear_combined_regressor() DNN Linear Combined Regression.

dnn_linear_combined_classifier() DNN Linear Combined Classification.

There is mention of SVMs and random forests "coming soon". Does anyone know of a way to implement SVMs and random forests in tensorflow through R at this time?

Thanks very much!

Andrea · September 28, 2018, 5:59pm

Hi! Welcome to the RStudio Community and happy coding with Tensorflow!

SVMs

Things should be relatively easy (crossing fingers ). The usual place where to find all the "non-mainstream" Tensorflow stuff is tf.contrib. Unfortunately, the SVM classes in tf.contrib.learn are being deprecated, so probably it's not a good idea to use them. However, a SVM is basically a (L^2-regularized) linear model (exactly the same model used in linear regression), where however instead of using the mean_squared_error loss, you use hinge_loss. You can take one of the numerous linear regression examples in Tensorflow, e.g.

https://tensorflow.rstudio.com/tensorflow/articles/examples/linear_regression_multiple.html

and then basically all you have to do is to substitute the cross-entropy loss with a call to the hinge loss:

https://www.tensorflow.org/api_docs/python/tf/losses/hinge_loss

Note that the above script doesn't call the Tensorflow mean squared error loss

https://www.tensorflow.org/api_docs/python/tf/losses/mean_squared_error

(which is the way to go for production code) because it's a example thought for learners, thus the loss function is explicitly computed as

cost <- tf$reduce_mean(tf$square(Y_hat - Y))

Random forests
Things get more dire with random forests! I've never implemented the Breitman algorithm from scratch myself...again, the usual place to find all these "non-mainstream" Tensorflow models is tf.contrib, but in this case I could only find some Python code:

you may try to use the R reticulate to run Python code from R, but I don't know if it supports Tensorflow...or you could try to convert the above code to R code, but it's fairly complicated and I wouldn't suggest that you do that, if this is the beginning of your Tensorflow journey. Chances are the code calls some other class/module which you should then dig out...let's try a different angle of attack:

why do you need SVMs and random forests in Tensorflow for R? Do you have access to a powerful GPU cluster, or to cloud instances with GPUs? If you're going to run on locale, you have extremely efficient implementations of these models in R. For example CRAN - Package ranger
if you really prefer to use Tensorflow, what about using a high level API such as Keras? https://keras.rstudio.com/ Keras basically wraps Tensorflow in a very user-friendly API. Linear regression is so simple in Keras that you don't even have an example for it, the simplest regression example in Keras already uses something more advanced than a linear regression! But you can easily convert it to a linear regression, just change
```
 model <- keras_model_sequential() %>%
   layer_dense(units = 64, activation = "relu",
               input_shape = dim(train_data)[2]) %>%
   layer_dense(units = 64, activation = "relu") %>%
   layer_dense(units = 1)
```
to
```
 model <- keras_model_sequential() %>%
   layer_dense(units = 1, activation = "linear",
               input_shape = dim(train_data)[2]) 
```
and again, once you can do linear regression, building a (linear) SVM is only a matter of substituting the MSE loss with the hinge loss. I can't help you with random forests in Keras, though.

Hope this helps!

Andrea · October 9, 2018, 4:40pm

I modified my post above removing all the references to logistic regression...I must have been under the influence when I wrote that stuff! SVM is not related to logistic regression. The model (for linear SVM) is actually the same as linear regression, not logistic regression, but it's fit using a different loss function. Also, the resulting model is not used for regression but for classification, which is what misled me.