I am using the recommenderlad package for movie recommendation and I am getting an error when I call the predict() method.
I will greatly appreciate any help or guidance on this.
Error:
Error in object@predict(object@model, newdata, n = n, data = data, type = type, :
number of items in newdata does not match model.
Here is what am doing :
I am using the movielens dataset from dslabs package
1. set aside a test (validation set).
I treat this as my unseen data. Because this data is unseen, I set it apart before preparing the training data
2. Next, I follow the regular process to turn the train_set and test_set into matrix, add rownames to the matrix, name the columns, etc, as required by recommendarlad
At this point I notice that the dimensions of the train_set and test_set are not the same. Of course there are more items (movies) in the train than test, which is expected.
3. use the trained model to predict, here I get an error
ubcf.predicted.test <- predict(object = ubcf.model.recommender, newdata = test_set, type = "ratings")
Error:
Error in object@predict(object@model, newdata, n = n, data = data, type = type, :
number of items in newdata does not match model.
Code is shown below:
if(!require(dslabs)) install.packages("dslabs", repos = "http://cran.us.r-project.org")
if(!require(tidyverse)) install.packages("tidyverse", repos = "http://cran.us.r-project.org")
if(!require(recommenderlab)) install.packages("recommenderlab", repos = "http://cran.us.r-project.org")
if(!require(caret)) install.packages("caret", repos = "http://cran.us.r-project.org")
library(dslabs)
library(tidyverse)
library(recommenderlab)
library(caret)
library(dplyr)
set.seed(1)
dataset <- movielens
# set aside validation
test_index <- createDataPartition(y = dataset$rating, times = 1, p = 0.2, list = FALSE)
train_set <- dataset[-test_index, ]
temp <- dataset[test_index, ]
# I want to make sure all userIds and movieIds in the
# test_set are also in train_set
test_set <- temp %>%
semi_join(train_set, by = "movieId") %>%
semi_join(train_set, by = "userId")
# I don't want to throw away the rows that were excluded from the test_set
# so I add them back to training set
removed <- anti_join(temp, test_set)
train_set <- rbind(train_set, removed)
train_set <- train_set %>% select(userId, movieId, rating)
test_set <- test_set %>% select(userId, movieId, rating)
## the userids are added as a column, remove it, and add proper row names
train_set <- train_set %>% spread(movieId, rating) %>% as("matrix")
row.names(train_set) <- train_set[, 1]
train_set <- train_set[, -1] %>% as("realRatingMatrix")
#prepare test_set in a similar way as train
test_set <- test_set %>% spread(movieId, rating) %>% as("matrix")
row.names(test_set) <- test_set[,1]
test_set <- test_set[, -1] %>% as("realRatingMatrix")
dim(train_set)
dim(test_set)
# set up cross validation to be used for the training
cv_scheme <- evaluationScheme(train_set, method="cross-validation", k=5, given=10)
# train UBCF model
ubcf.model.recommender <- Recommender(data = getData(cv_scheme, "train"), method = "UBCF")
# predict on new data (test)
ubcf.predicted.test <- predict(object = ubcf.model.recommender, newdata = test_set, type = "ratings")