What is happening internally when I generate() bivariate data with "infer"?

Rsky · December 18, 2020, 4:20am

if I had this code,

library(tidymodels)
library(epiDisplay)
data(Marryage)

null_dist <- Marryage %>%
  specify(response = birthyr) %>%
  hypothesize(null = "point", mu = 1960) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "mean")

null_dist %>%
  visualize()

There are things I don't understand.

In bootstrap, we create fake data.
If I collect the fake data and calculate the average, it should converge to the average of the first SAMPLE data.

Marryage %>%
   specify(birthyr ~ sex) %>%
   hypothesize(null = "independence") %>%
   generate(reps = 1000, type = "bootstrap") %>%
  group_by(replicate) %>% summarise(nn=n(),su=sum(birthyr),mm=mean(birthyr),ss=sd(birthyr))%>%
  ggplot()+
  aes(x=mm)+
  geom_histogram()

However, the histgram created by first code converges to 1960(center), which is specified by null.

Is this histgram being created by type="simurate"?
Or, is this histgram obtained because I discarded all but the bootstrap sample, which has a mean value of 1960?

thank you for read this line.

system · January 8, 2021, 4:20am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.