Hi, and welcome!
Two preliminaries:
-
Please see the FAQ: What's a reproducible example (`reprex`) and how do I do one? Using a reprex, complete with representative data will attract quicker and more answers.
-
Check the community homework policy, which requires some disclosure of the assignment and explains members are here to help you get unstuck, but not to "give you the answer"
Let's start by looking at the structure
of the movies
data set
library(ggplot2movies)
str(movies)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 58788 obs. of 24 variables:
#> $ title : chr "$" "$1000 a Touchdown" "$21 a Day Once a Month" "$40,000" ...
#> $ year : int 1971 1939 1941 1996 1975 2000 2002 2002 1987 1917 ...
#> $ length : int 121 71 7 70 71 91 93 25 97 61 ...
#> $ budget : int NA NA NA NA NA NA NA NA NA NA ...
#> $ rating : num 6.4 6 8.2 8.2 3.4 4.3 5.3 6.7 6.6 6 ...
#> $ votes : int 348 20 5 6 17 45 200 24 18 51 ...
#> $ r1 : num 4.5 0 0 14.5 24.5 4.5 4.5 4.5 4.5 4.5 ...
#> $ r2 : num 4.5 14.5 0 0 4.5 4.5 0 4.5 4.5 0 ...
#> $ r3 : num 4.5 4.5 0 0 0 4.5 4.5 4.5 4.5 4.5 ...
#> $ r4 : num 4.5 24.5 0 0 14.5 14.5 4.5 4.5 0 4.5 ...
#> $ r5 : num 14.5 14.5 0 0 14.5 14.5 24.5 4.5 0 4.5 ...
#> $ r6 : num 24.5 14.5 24.5 0 4.5 14.5 24.5 14.5 0 44.5 ...
#> $ r7 : num 24.5 14.5 0 0 0 4.5 14.5 14.5 34.5 14.5 ...
#> $ r8 : num 14.5 4.5 44.5 0 0 4.5 4.5 14.5 14.5 4.5 ...
#> $ r9 : num 4.5 4.5 24.5 34.5 0 14.5 4.5 4.5 4.5 4.5 ...
#> $ r10 : num 4.5 14.5 24.5 45.5 24.5 14.5 14.5 14.5 24.5 4.5 ...
#> $ mpaa : chr "" "" "" "" ...
#> $ Action : int 0 0 0 0 0 0 1 0 0 0 ...
#> $ Animation : int 0 0 1 0 0 0 0 0 0 0 ...
#> $ Comedy : int 1 1 0 1 0 0 0 0 0 0 ...
#> $ Drama : int 1 0 0 0 0 1 1 0 1 0 ...
#> $ Documentary: int 0 0 0 0 0 0 0 1 0 0 ...
#> $ Romance : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Short : int 0 0 1 0 0 0 0 1 0 0 ...
Created on 2020-03-18 by the reprex package (v0.3.0)
Ok, it's a data frame with 24 variables capturing various aspects of the 58,788 movies it describes.
What's needed? Average rating by genre. Which variable holds the rating for a movie? I'm going to call that SCORE
to not spoil the fun.
Which variables indicate the genre? No spoilers here: Action, Animation, Comedy, Drama, Documentary, Romance, and Short.
Using the dplyr
package's select
function, you can create a skinnier data frame to work with for this problem
movies %>% select(SCORE, Action, Animation, Comedy, Drama, Documentary, Romance, and Short) -> genres
Not needed strictly, but easier on the eyes.
genres <- structure(list(SCORE = c(6.4, 6, 8.2, 8.2, 3.4, 4.3), Action = c(0L, 0L, 0L, 0L, 0L, 0L), Animation = c(0L, 0L, 1L, 0L, 0L, 0L), Comedy = c(1L, 1L, 0L, 1L, 0L, 0L), Drama = c(1L, 0L, 0L, 0L, 0L, 1L), Documentary = c(0L, 0L, 0L, 0L, 0L, 0L), Romance = c(0L, 0L, 0L, 0L, 0L, 0L), Short = c(0L, 0L, 1L, 0L, 0L, 0L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))
genres
#> SCORE Action Animation Comedy Drama Documentary Romance Short
#> 1 6.4 0 0 1 1 0 0 0
#> 2 6.0 0 0 1 0 0 0 0
#> 3 8.2 0 1 0 0 0 0 1
#> 4 8.2 0 0 1 0 0 0 0
#> 5 3.4 0 0 0 0 0 0 0
#> 6 4.3 0 0 0 1 0 0 0
Created on 2020-03-18 by the reprex package (v0.3.0)
(These are just the first few rows, of course.)
Assuming you were just interested in Comedy
, how would you further reduce genres
to just those films?
suppressPackageStartupMessages(library(dplyr))
# OMITTED genres <- structure(list ...
comedies <- genres %>% filter(Comedy == 1) %>% select(SCORE,Comedy)
comedies
#> # A tibble: 3 x 2
#> SCORE Comedy
#> <dbl> <int>
#> 1 6.4 1
#> 2 6 1
#> 3 8.2 1
Created on 2020-03-18 by the reprex package (v0.3.0)
The function mean()
will find your average SCORE, so back to you to fill in the blank
mean(_____)