Calculating the probability of a single die roll with replacement

paulgureghian · April 15, 2018, 12:07am

calculate the probability of not seeing a 6 on a single roll.

p_no6 <- sample(6,1,replace=TRUE)

am I on the right path, or do I need actual math ?

jdlong · April 15, 2018, 2:43am

I'm a huge fan of doing simulations instead of doing actual math. I've managed to build a career on it!

Yes, you are on the right path. You simulated a single roll of the die. The replace parameter doesn't really matter since you're only drawing once. But once you start drawing more than one, it's important.

paulgureghian · April 15, 2018, 9:15pm

do I need the sample function ?

wilsonfreitas · April 15, 2018, 9:42pm

As @jdlong had written before you are in the right path. Yes, you do need the sample function.
Think of the replace=TRUE argument as a restriction to be imposed when you sample more than one dice.
In this case you are assuming the dices are independent of each other.

jdlong · April 15, 2018, 10:25pm

whether you need the sample function depends on what you want to do. If you want to simulate a single (or vector of) dice rolls, then sample seems like the most straightforward way I can think of. You can do a whole bunch of dice rolls (100 million in this example) as follows:

rolls <- sample(6,100000000,replace=TRUE)
not6 <- rolls[rolls < 6]
## percent not 6 on single roll
length(not6) / length(rolls)

paulgureghian · April 15, 2018, 10:26pm

so just use the default param of replace = FALSE? p_no6 <- sample(6,size=1) still not what I need.

paulgureghian · April 15, 2018, 10:33pm

right.. 5/6 is the probability that it wont be 6 on a single dice roll. how to use that in the sample function ?

edgararuiz · April 15, 2018, 11:17pm

Definitely set replace to TRUE if you want to run a simulation, say of 100 rolls. The idea is that around 84 rolls should not be 6 (5/6 * 100):

set.seed(100)
rolls <- sample(6, 100, replace = TRUE)
t <- table(rolls)
no_6 <- t[names(t)!="6"]
sum(no_6)
((5/6) * 100)

paulgureghian · April 15, 2018, 11:22pm

this worked: p_no6 <- 1 - (1/6) the premise I was working under for the probability not seeing a six on a single roll was 5/6. 1/6 seems like the probability of seeing a six.

why is 1/6 the probability of not seeing a six instead of 5/6 ?

paulgureghian · April 15, 2018, 11:28pm

I did not need any simulation, just the probability. is there a reason for this ?

edgararuiz · April 15, 2018, 11:30pm

I guess the answer that if you just want the probability answer, you don't need to use the sample() function at all. It does become useful if, like me and @jdlong , you'd like to see the actual p = 1 - (5/6) formula in action via a quick R script

paulgureghian · April 15, 2018, 11:34pm

why is it 1/6 instead of 5/6 for the probability of not seeing a six ?

edgararuiz · April 15, 2018, 11:34pm

Is not, the probability of not seen six in a die roll it is indeed 5/6

paulgureghian · April 15, 2018, 11:47pm

so its possible to calculate the probability without a simulation ?

edgararuiz · April 15, 2018, 11:57pm

In this case yes, because each roll will be independent from any previous rolls, and you have discrete outcomes.

paulgureghian · April 15, 2018, 11:58pm

so, you would want simulations for non-discrete or dependent events ?

edgararuiz · April 16, 2018, 12:07am

In R, I typically see simulated numbers created to build toy examples, or test out code. But for those cases, other functions, such as the runif() or rnorm() functions are usually selected to create the simulated data:

The rnorm() function will give you a set of data with a normal distribution, making it really good to learn about probabilities of continuous variables:

set.seed(100)
hist(rnorm(100))

Ideally, one starts practicing calculating probabilities on non-simulated data sets, such as iris or mtcars

paulgureghian · April 16, 2018, 12:14am

I see you were using simulation with replace=TRUE. I would think the default param would be better suited for simulation to get the probability

edgararuiz · April 16, 2018, 12:17am

If you want more simulation than the number of possible values, you need to set replace to TRUE. In the example, I requested 100 random values, so the sample() function would have failed if I would have left it intact.