Sampling without replacement multiple times and ensuring equal samples

I have a set of 37 products that I want people to rate 20 times, and I want each person to rate three different products. I think I need somewhere in the ballpark of 250 people to rate these products because (37 * 20) / 3 = 246.67.

Is there a way to sample three products from the set of products without replacement and to do so multiple times in a way that each product gets rated 20 times?

We can use the mpg dataset from the ggplot2 package for demonstration purposes. The mpg dataset has 38 distinct car models in it. I can sample three car models from the dataset. Would you happen to know how I can do this approximately 246 more times, ensuring that each car model is sampled 20 times?

library(tidyverse)

# get a character vector of 38 distinct car models from the mpg dataset
mpg %>% 
  distinct(model) %>% 
  pull(model) ->
  car_model

# sample three car models without replacement
sample(
  x       = car_model, 
  size    = 3, 
  replace = FALSE
)

I don't have any intuitive solution and I also think that there is no obvious solution here. Below is one approach you can take:

library(tidyverse)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang

# get a character vector of 38 distinct car models from the mpg dataset
mpg %>% 
  distinct(model) %>% 
  pull(model) ->
  car_model


take_one <- function(bags){
  car_tbl <- bags %>%
    dplyr::sample_n(size = 3, weight = times) %>%
    dplyr::mutate(taken = 1) %>%
    dplyr::select(car, taken)
  
  bags <- bags %>%
    dplyr::left_join(car_tbl, by = "car") %>%
    tidyr::replace_na(replace = list(taken = 0)) %>%
    dplyr::mutate(times = times - taken) %>%
    dplyr::select(car, times)
  
  list(cars = car_tbl %>% dplyr::pull(car), bags = bags)
}

bags <- tibble::tibble(car = car_model, times = 20)
choices <- rep(NA, 255)

for(i in seq_along(choices)) {
  res <- take_one(bags)
  choices[i] <- list(res[["cars"]])
  bags <- res[["bags"]]
}
#> Error in sample.int(n(), check_size(~3, n(), replace = replace), replace = replace, : too few positive probabilities

choices %>% purrr::flatten_chr() %>% table()
#> .
#>            4runner 4wd                     a4             a4 quattro 
#>                     20                     20                     20 
#>             a6 quattro                 altima     c1500 suburban 2wd 
#>                     20                     20                     19 
#>                  camry           camry solara            caravan 2wd 
#>                     20                     20                     20 
#>                  civic                corolla               corvette 
#>                     20                     20                     20 
#>      dakota pickup 4wd            durango 4wd         expedition 2wd 
#>                     20                     20                     20 
#>           explorer 4wd        f150 pickup 4wd           forester awd 
#>                     20                     20                     20 
#>     grand cherokee 4wd             grand prix                    gti 
#>                     20                     20                     20 
#>            impreza awd                  jetta        k1500 tahoe 4wd 
#>                     20                     20                     20 
#> land cruiser wagon 4wd                 malibu                 maxima 
#>                     20                     20                     20 
#>        mountaineer 4wd                mustang          navigator 2wd 
#>                     20                     20                     20 
#>             new beetle                 passat         pathfinder 4wd 
#>                     20                     20                     20 
#>    ram 1500 pickup 4wd            range rover                 sonata 
#>                     20                     20                     20 
#>                tiburon      toyota tacoma 4wd 
#>                     20                     20

Created on 2019-05-22 by the reprex package (v0.3.0)

Basically, take_one is a function that takes in a tibble with bags (i.e., each model starts with a bag of 20 which is the maximum number this model can be chosen). The function takes three cars and modifies bags to decrement times in corresponding rows.

You then iteratively go over 250 people and create 3 choices for each person to rate. The error is there because at some point you have a situation where you want to pick 3 things, but you only have 2 rows left (since 247 is not divisible by 3). But good thing is that you'll still get your choices object that you can then use.

1 Like

@mishabalyasin your solution works perfectly. Thank you so much!!!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.