mlflow R api for AWS storage ModuleNotFoundError: No module named 'boto3'

I'm using mlflow as a client and trying to save artifacts.

# set up
pacman::p_load(mlflow)
# Sys.setenv(MLFLOW_PYTHON_BIN = "usr/bin/python")
install_mlflow()
mlflow_set_tracking_uri("http://localhost:5000/")
mlflow_client()

Now when I try to run example code block using the R API for mlflow:

pacman::p_load(carrier, e1071, MASS, caret, randomForest, SparkR)

with(mlflow_start_run(experiment_id = 2), {
  
  # Set the model parameters
  ntree <- 110
  mtry <- 7
  
  # Create and train model
  rf <- randomForest(type ~ ., data=Pima.tr, ntree=ntree, mtry=mtry)
  
  # Use the model to make predictions on the test dataset
  pred <- predict(rf, newdata=Pima.te[,1:7])
  
  # Log the model parameters used for this run
  mlflow_log_param("ntree", ntree)
  mlflow_log_param("mtry", mtry)
  
  # Define metrics to evaluate the model
  cm <- confusionMatrix(pred, reference = Pima.te[,8])
  sensitivity <- cm[["byClass"]]["Sensitivity"]
  specificity <- cm[["byClass"]]["Specificity"]
  
  # Log the value of the metrics 
  mlflow_log_metric("sensitivity", sensitivity)
  mlflow_log_metric("specificity", specificity)
  
  # Log the model
  # The crate() function from the R package "carrier" stores the model as a function
  predictor <- crate(function(x) predict(rf,.x))
  mlflow_log_model(predictor, "model")
  
  # Create and plot confusion matrix
  png(filename="confusion_matrix_plot.png")
  barplot(as.matrix(cm), main="Results",
         xlab="Observed", ylim=c(0,200), col=c("green","blue"),
         legend=rownames(cm), beside=TRUE)
  dev.off()
  
  # Save the plot and log it as an artifact
  mlflow_log_artifact("confusion_matrix_plot.png") 
    
})

Get:

Traceback (most recent call last):
  File "/home/work/anaconda3/envs/r-mlflow-1.17.0/bin/mlflow", line 8, in <module>
    sys.exit(cli())
  File "/home/work/anaconda3/envs/r-mlflow-1.17.0/lib/python3.6/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/home/work/anaconda3/envs/r-mlflow-1.17.0/lib/python3.6/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/home/work/anaconda3/envs/r-mlflow-1.17.0/lib/python3.6/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/work/anaconda3/envs/r-mlflow-1.17.0/lib/python3.6/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/work/anaconda3/envs/r-mlflow-1.17.0/lib/python3.6/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/work/anaconda3/envs/r-mlflow-1.17.0/lib/python3.6/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/work/anaconda3/envs/r-mlflow-1.17.0/lib/python3.6/site-packages/mlflow/store/artifact/cli.py", line 67, in log_artifacts
    artifact_repo.log_artifacts(local_dir, artifact_path)
  File "/home/work/anaconda3/envs/r-mlflow-1.17.0/lib/python3.6/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 84, in log_artifacts
    s3_client = self._get_s3_client()
  File "/home/work/anaconda3/envs/r-mlflow-1.17.0/lib/python3.6/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 39, in _get_s3_client
    import boto3
ModuleNotFoundError: No module named 'boto3'
System command 'mlflow' failed, exit status: 1, stdout & stderr were printed

I've installed boto3 on my local system (Ubuntu 18.04):

Tried:

pip3 install boto3
pip install boto3
python -m pip install --user boto3

I'm sure boto3 is on my system, I just cannot get r to find it.

How can I let r know that boto3 is indeed installed on my system and to use it?

You have to install boto3 for the same python environment mlflow is using

It is much easier to manage the python environments from Anaconda than doing it with pip

Hi @andresrcs could you expand on this? I'm not in python very often, I'm doing all this from within rstudio interface. How can I install boto3 for this environment?

To ask my question another way, my entire Rmd script is the above two code blocks. Is it possible for me to install boto3 'from here' or must I do this in the terminal?

I don't have experience working with mlflow so I can't give you specific advice but you can install python libraries in virtual environments from R using reticulate::virtualenv_install()

After a quick look to the package documentation I can't find a mention to virtual envs but since you are using Anaconda instead of a stand alone python installation, that might be the cause of it.

With help from mlflow Slack channel, I had to add this in the terminal: conda activate r-mlflow-1.17.0; pip3 install boto3", this resolved my issue.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.