Multiple values for one column

technocrat · December 29, 2019, 4:57am

Hi, and welcome!

@andresrcs and @gueyenono's examples of a reproducible example, called a reprex will get you started on the second part of your question, producing a boxplot from data similar to yours disaggregated into genres with some measure (he used average) of ratings.

The first part, how to disaggregate the first column is more interesting. A year ago, I worked on the movielens dataset, and I've uploaded a small, grouped by genres subset of fifty records with the same genres field and the ratings expressed as the number of ratings received for the combination of genres. The data can be loaded with

movies <- read_csv("https://gist.githubusercontent.com/technocrat/b5a78af6c174eb983b95a19659b83a33/raw/ab0664b1ac314902b2e130aae43e9f71214de43f/movielens.csv")

Now, to clarify the question: Are you interested in grouping by combinations of genres "Action | Drama | War" or by each genre (which means greater weight to the ratings of multiply classified movies)? When I did my analysis, I settled on the first genre, but that was only one of many choices I could have made.

Finally, how many records will you be working with, is your data derived from movielens and is this homework (in which case see, homework policy).