I have a set of 37 products that I want people to rate 20 times, and I want each person to rate three different products. I think I need somewhere in the ballpark of 250 people to rate these products because (37 * 20) / 3 = 246.67.

Is there a way to sample three products from the set of products without replacement and to do so multiple times in a way that each product gets rated 20 times?

We can use the mpg dataset from the ggplot2 package for demonstration purposes. The mpg dataset has 38 distinct car models in it. I can sample three car models from the dataset. Would you happen to know how I can do this approximately 246 more times, ensuring that each car model is sampled 20 times?

library(tidyverse)
# get a character vector of 38 distinct car models from the mpg dataset
mpg %>%
distinct(model) %>%
pull(model) ->
car_model
# sample three car models without replacement
sample(
x = car_model,
size = 3,
replace = FALSE
)

I don't have any intuitive solution and I also think that there is no obvious solution here. Below is one approach you can take:

library(tidyverse)
#> Registered S3 methods overwritten by 'ggplot2':
#> method from
#> [.quosures rlang
#> c.quosures rlang
#> print.quosures rlang
# get a character vector of 38 distinct car models from the mpg dataset
mpg %>%
distinct(model) %>%
pull(model) ->
car_model
take_one <- function(bags){
car_tbl <- bags %>%
dplyr::sample_n(size = 3, weight = times) %>%
dplyr::mutate(taken = 1) %>%
dplyr::select(car, taken)
bags <- bags %>%
dplyr::left_join(car_tbl, by = "car") %>%
tidyr::replace_na(replace = list(taken = 0)) %>%
dplyr::mutate(times = times - taken) %>%
dplyr::select(car, times)
list(cars = car_tbl %>% dplyr::pull(car), bags = bags)
}
bags <- tibble::tibble(car = car_model, times = 20)
choices <- rep(NA, 255)
for(i in seq_along(choices)) {
res <- take_one(bags)
choices[i] <- list(res[["cars"]])
bags <- res[["bags"]]
}
#> Error in sample.int(n(), check_size(~3, n(), replace = replace), replace = replace, : too few positive probabilities
choices %>% purrr::flatten_chr() %>% table()
#> .
#> 4runner 4wd a4 a4 quattro
#> 20 20 20
#> a6 quattro altima c1500 suburban 2wd
#> 20 20 19
#> camry camry solara caravan 2wd
#> 20 20 20
#> civic corolla corvette
#> 20 20 20
#> dakota pickup 4wd durango 4wd expedition 2wd
#> 20 20 20
#> explorer 4wd f150 pickup 4wd forester awd
#> 20 20 20
#> grand cherokee 4wd grand prix gti
#> 20 20 20
#> impreza awd jetta k1500 tahoe 4wd
#> 20 20 20
#> land cruiser wagon 4wd malibu maxima
#> 20 20 20
#> mountaineer 4wd mustang navigator 2wd
#> 20 20 20
#> new beetle passat pathfinder 4wd
#> 20 20 20
#> ram 1500 pickup 4wd range rover sonata
#> 20 20 20
#> tiburon toyota tacoma 4wd
#> 20 20

^{Created on 2019-05-22 by the reprex package (v0.3.0)}

Basically, take_one is a function that takes in a tibble with bags (i.e., each model starts with a bag of 20 which is the maximum number this model can be chosen). The function takes three cars and modifies bags to decrement times in corresponding rows.

You then iteratively go over 250 people and create 3 choices for each person to rate. The error is there because at some point you have a situation where you want to pick 3 things, but you only have 2 rows left (since 247 is not divisible by 3). But good thing is that you'll still get your choices object that you can then use.