Simulating individual responses (binary) based on known population probability

helanthiusfan · December 15, 2020, 4:40pm

Hello! I have a very large dataset of individual responses and I have an estimation of the likelihood of a specific response (yes/no). It was suggested that I could use the rbinom package to simulate the responses for the individual records using the known population probability (i.e.: the likelihood that the individuals say yes is 32%). How would I go about doing this? I'm honestly at a loss and have been trying to figure this out for longer than I care to admit. I've tried researching sensitivity analysis (which is how this was presented), binomial distributions, probabilistic sensitivity analysis, but I'm still stuck on the basic starting point. The workflow presented was: define population probability as p1=32% > this approximates a binomial distribution B(n, p1) where n is the number of observations > use these parameters to simulate random outcomes for each individual/household > aggregate to county/specific geography. How do I set this up? Do I use the formula to create a new variable in the dataset such as:

mydata$new_var <- rbinom(mydata, sum(weight), .32)

AlexisW · December 17, 2020, 4:28am

I don't know anything about sensitivity analysis, and how a simulation can help you. Here are some thoughts on the simulation part.

One thing is not clear to me: what does one row of your dataset represent? Is it one individual? Then, each individual response is a Bernoulli event, and only the sum of all these follows a binomial distribution.

To take the classic example of a coin flipping, let's say each individual flips a coin, with a probability 0.32 of heads (it's an unfair coin) and 1-0.32 = 0.68 of tails.

In R we can simulate that with rbinom(1,1, 0.32).

rbinom(1,1,.32)
#> [1] 0

But what if you have 10 individuals who flip a coin?

replicate(n = 10, rbinom(1,1,.32))
#>  [1] 0 0 0 1 0 0 1 0 1 1

Since it's slightly painful to write replicate every time, R has a shortcut: the first argument to rbinom() is the number of individuals:

rbinom(10, 1, .32)
#>  [1] 1 1 0 1 1 0 0 0 0 1

But now, what if there is a single individual who flips the coin several time? The first time, they get either 0 (heads) or 1 (tails). Flipping twice, they can get 00 or 01 or 10 or 11. So the total number of tails they got is 0, 1, or 2. That can be described with the binomial distribution for n=2 coin flips, with probability p=0.32. In R:

rbinom(1, 2, .32)
#> [1] 0

Conclusion

So, if you want to simulate one yes/no choice per individual (per row of the data frame), the correct way is:

N_ind <- nrow(mydata)  # nb individuals
n <- 1                 # nb choices
p1 <- 0.32             # proba of each choice

mydata$choice <- rbinom(N_ind, n, p1)

Of course, if the outcome for one individual is the result of several yes/no decisions, you will have to change n.

Aggregation

A pointer for the next step, with dplyr you can do something like that:

library(dplyr)

mydata %>%
  group_by(county) %>%
  summarize(nb_positive_choices = sum(choice),
            prop_positive_choices = mean(choice))

system · January 7, 2021, 4:28am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.