ERROR - Recommenderlab predict() - number of items in newdata does not match model

I am using the recommenderlad package for movie recommendation and I am getting an error when I call the predict() method.

I will greatly appreciate any help or guidance on this.

Error:
Error in object@predict(object@model, newdata, n = n, data = data, type = type, :
number of items in newdata does not match model.

Here is what am doing :
I am using the movielens dataset from dslabs package

1. set aside a test (validation set).
I treat this as my unseen data. Because this data is unseen, I set it apart before preparing the training data

2. Next, I follow the regular process to turn the train_set and test_set into matrix, add rownames to the matrix, name the columns, etc, as required by recommendarlad
At this point I notice that the dimensions of the train_set and test_set are not the same. Of course there are more items (movies) in the train than test, which is expected.

3. use the trained model to predict, here I get an error

ubcf.predicted.test <- predict(object = ubcf.model.recommender, newdata = test_set, type = "ratings")

Error:
Error in object@predict(object@model, newdata, n = n, data = data, type = type, :
number of items in newdata does not match model.

Code is shown below:


if(!require(dslabs)) install.packages("dslabs", repos = "http://cran.us.r-project.org")
if(!require(tidyverse)) install.packages("tidyverse", repos = "http://cran.us.r-project.org")
if(!require(recommenderlab)) install.packages("recommenderlab", repos = "http://cran.us.r-project.org")
if(!require(caret)) install.packages("caret", repos = "http://cran.us.r-project.org")

library(dslabs)
library(tidyverse)
library(recommenderlab)
library(caret)
library(dplyr)

set.seed(1)

dataset <- movielens

# set aside validation
test_index <- createDataPartition(y = dataset$rating, times = 1, p = 0.2, list = FALSE)
train_set <- dataset[-test_index, ]
temp <- dataset[test_index, ]

# I want to make sure all userIds and movieIds in the 
# test_set are also in train_set
test_set <- temp %>%
  semi_join(train_set, by = "movieId") %>%
  semi_join(train_set, by = "userId")

# I don't want to throw away the rows  that were excluded from the test_set
# so I add them back to training set
removed <- anti_join(temp, test_set)
train_set <- rbind(train_set, removed)

train_set <- train_set %>% select(userId, movieId, rating)
test_set  <- test_set  %>% select(userId, movieId, rating)

## the userids are added as a column, remove it, and add proper row names
train_set <- train_set %>%  spread(movieId, rating) %>% as("matrix")
row.names(train_set) <- train_set[, 1]
train_set <- train_set[, -1] %>% as("realRatingMatrix")

#prepare test_set in a similar way as train
test_set  <- test_set  %>%  spread(movieId, rating) %>% as("matrix")
row.names(test_set) <- test_set[,1]  
test_set <- test_set[, -1] %>% as("realRatingMatrix")

dim(train_set)
dim(test_set)

# set up cross validation to be used for the training
cv_scheme <- evaluationScheme(train_set, method="cross-validation", k=5, given=10)

# train UBCF model
ubcf.model.recommender <- Recommender(data = getData(cv_scheme, "train"), method = "UBCF")

# predict on new data (test)
ubcf.predicted.test <- predict(object = ubcf.model.recommender, newdata = test_set, type = "ratings")



This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.