Using Subset Equation

Could someone help me figure out how to solve this problem? I am new to R Studio and aren't really sure how to work around it.

Select out all the people who have purchased both books (books) and movies (movies). What is the mean discount (discounts) received by these people?

Hint: use $ sign to grab variables, use == for comparison, mean()

I know that I need to use subset but not really sure how to use it. I tried uploading the data set that is related to the problem but I can't since I am a new user. But I can try to explain it. books, movies and discounts are columns in the data set. Those that purchased are classified as true and those that didn't are classified as false. Let me know if this helps!


just as a side note for future requests: You may want to provide some dummy data which can be used to work with inside a reproducible example (reprex). Here are some solutions inside base R you can use to get the desired result:

# create dummy data
Data <- data.frame(
  user_id = LETTERS[1:10],
  books = sample(c(FALSE,TRUE),10,replace = TRUE,prob = c(0.2,0.8)),
  movies = sample(c(FALSE,TRUE),10,replace = TRUE, prob = c(0.2,0.8)),
  discount = sample(x = seq.default(0,50,10), size = 10, replace = TRUE)
#>    user_id books movies discount
#> 1        A  TRUE   TRUE       40
#> 2        B  TRUE  FALSE       20
#> 3        C FALSE   TRUE        0
#> 4        D  TRUE   TRUE       40
#> 5        E FALSE   TRUE       50
#> 6        F  TRUE   TRUE       40
#> 7        G FALSE   TRUE       50
#> 8        H  TRUE   TRUE        0
#> 9        I  TRUE  FALSE        0
#> 10       J  TRUE   TRUE       40

# in a single line
mean(subset(Data, subset = books & movies)$discount)
#> [1] 32

# using pipes to create more human readable code
Data |>
  subset(books & movies) |>
  with(discount) |>
#> [1] 32

# operating with subsets inside a data.frame
mean(Data[which(Data$books & Data$movies),]$discount)
#> [1] 32

Created on 2022-09-21 with reprex v2.0.2

Kind regards

Edit: Just saw that the encoding is TRUE/FALSE instead of 1/0, so I adjusted the code for you. :slight_smile:

1 Like

Thank you so much for your help! For some reason when I type mean(Data[which(Data$books & Data$movies),]$discount)
I get a warning message that says:
Warning message:
In mean.default(Amazon[which(Amazon$books & Amazon$movies), ]$discounts) :
argument is not numeric or logical: returning NA

Any idea what this means?

Not without seeing your data. I would argue that you have NAs inside your discount column which could explain the NA outcome. This would be an easy fix, since you can just set na.rm = TRUE inside the mean() function.

But I can have a closer look if you type dput(head(Data,50)) and paste the result into the forum.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.