I'm very new to R and trying to work on a project on predicting movie ratings. I was able to complete the EDA's but struggling to understand, how to work on a model for predicting the ratings for each user . Can somebody please explain? (Data set has users in rows and movies in columns and ratings accordingly)
m1. m2 m3 m4
u1 2. 4. NA. NA
u2 3. 5. NA. 1
u3 NA. 1. 2. 2
u4. NA. NA. 3. NA
Hi @Vicky_Das,
Welcome to the RStudio Community Forum.
Here is as reproducible example showing how to get your sample data into a dataframe, to start calculating simple statistics, and thinking about modelling options:
a <- "
user m1 m2 m3 m4
u1 2 4 NA NA
u2 3 5 NA 1
u3 NA 1 2 2
u4 NA NA 3 NA
"
dat <- read.table(text=a, header=TRUE)
dat
#> user m1 m2 m3 m4
#> 1 u1 2 4 NA NA
#> 2 u2 3 5 NA 1
#> 3 u3 NA 1 2 2
#> 4 u4 NA NA 3 NA
str(dat)
#> 'data.frame': 4 obs. of 5 variables:
#> $ user: chr "u1" "u2" "u3" "u4"
#> $ m1 : int 2 3 NA NA
#> $ m2 : int 4 5 1 NA
#> $ m3 : int NA NA 2 3
#> $ m4 : int NA 1 2 NA
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
dat %>%
pivot_longer(cols=c(m1:m4), names_to="movie", values_to="rating") %>%
group_by(user) %>%
summarise(m_rating = mean(rating, na.rm=TRUE))
#> # A tibble: 4 x 2
#> user m_rating
#> <chr> <dbl>
#> 1 u1 3
#> 2 u2 3
#> 3 u3 1.67
#> 4 u4 3
dat %>%
pivot_longer(cols=c(m1:m4), names_to="movie", values_to="rating") %>%
group_by(movie) %>%
summarise(m_rating = mean(rating, na.rm=TRUE))
#> # A tibble: 4 x 2
#> movie m_rating
#> <chr> <dbl>
#> 1 m1 2.5
#> 2 m2 3.33
#> 3 m3 2.5
#> 4 m4 1.5
Thank you for your message. However, I was able to convert the data frame into matrix and then recommend the movie ratings through "UBCF" method. Let me know what you think?
# train1 has the 75% of the random data from actual dataset
train1 <- as.matrix(train)
train <- train[-1,]
train1<- as(train1,"realRatingMatrix")
dim(train1)
# Creation of the model - U(ser) B(ased) C(ollaborative) F(iltering)
Rec.model<-Recommender(train1[1:3636], method = "UBCF")
#Then used my recommendation model on "test" dataset.
# test1 has 25% of the random data from actual dataset
class(test)
test1 <- as.matrix(test)
test1<- as(test1,"realRatingMatrix")
dim(test1)
predicted.user <- predict(Rec.model, test1, type="ratings")
View(as(predicted.user, "data.frame"))
# Next I cross validated to check the missing ratings from one of the user ID
# to see the predicted value for ID no. "40" we didn't have any value before or had "NA" earlier
View(as(predicted.user["40"], "data.frame"))