I would love help with the dataframe I am trying to build. I need to create a dataframe that has 500 rows and 3 columns. Each column is a "Day" (defined below) and each row is an individual entry. The values must be random and CANNOT repeat across the rows, but may repeat within the columns. I have been able to build this in piece meal below. Is there a way to do this in a clean, fast way? THANK YOU FOR ANY HELP.
I'd be inclined to approach this as a combinatorics problem, in which case I'd need to know:
Considering each row as a set, do the sets need to be unique?
If so, are these rows considered unique (i.e. does order matter)? {Monday Morning, Monday Evening, Tuesday Morning} {Monday Morning, Tuesday Morning, Monday Evening}
I'm also not sure what you have in mind when you say "the values must be random". For instance, if you generated a large population of sets (rows) and randomly sampled 500 sets from that population without replacement, would that be the right structure of randomness for your purposes? If not, what are your constraints?
Edited to add: if you want to start playing with code along these lines, my favorite package for combinations and permutations is arrangements.
Each row as a set does need to be unique.
{Monday Morning, Monday Evening, Tuesday Morning} is OKAY.
{Monday Morning, Monday Morning, Monday Evening} is NOT OKAY. Monday Morning repeats.
Order does not matter. The 3 terms as a set can appear in any order, so long as they do not repeat.
How I am defining random:
You are almost correct in your assessment. If a large population of sets were generated, and 500 of those were sampled, then that would work! Replacement is okay with sets.
The idea is to mimic 500 people choosing their top 3 preferred day/time option. A person would not logically choose the same day/time more than once, but we may see more than one person choosing the same top 3 days/times.
Hope this clarifies. I am happy to add more if needed.
Here's one approach to doing this that uses functions from purrr to create the data frame. It takes your original construction of a random sample of 3 days and the repeats that function 500 times and then combines it by row into a single data frame. Probably not the most efficient, but if you only need 500 samples, it should be fine.
This looks great @mfherman!
Exactly what I want. However, when I try to run your script, it makes my tibble 1500 x 1.
Any suggestions on how to transform the tibble into a 500 x 3 like yours?
I have tidyverse and purrr installed and selected in my library.
@mfherman's solution is a really nice example of how to use purrr to go from a one-piece-at-a-time solution to a full solution! Not all problems happen to coincide so neatly with an area of mathematics that people like to write packages for, so ultimately learning to use tools like purrr is more generally useful, I think
Thank you both @mfherman and jcblum. You have both been very helpful. Your advice is very appreciated. I got the dataframe to generate the way I wanted. Yay!
This is a great community. Thank you for helping people like me!
(Warning: following links from that SO page can send you down a deep rabbit hole of combinatoric fun… I can lose an afternoon to this stuff if I'm not careful )
If your question's been answered, would you mind choosing a solution? There's no imaginary internet points involved, so just choose the one that you used/liked best/whatever — nobody will get mad . Choosing a solution helps other people see which questions still need help, or find solutions if they have similar problems.
Just to keep it all neat, I'd probably create a data frame with the days and their corresponding probabilities and then as @jcblum wrote, add the weights to the sample() function.
As an example, I generated a random probabilities for each day. You could replace these with the real probability and run the following code