library(rsample)
library(tibble)
library(purrr)
group_1 <- rnorm(20,250,1)
group_2 <- rnorm(20,260,1)
dd <- tibble(
gp = c(rep(1, 20), rep(2, 20)),
x = c(group_1, group_2)
)
boots <- bootstraps(dd, times = 100)
results <- map(boots$splits, ~t.test(x ~ gp, data = analysis(.x)))
results
#> [[1]]
#>
#> Welch Two Sample t-test
#>
#> data: x by gp
#> t = -39.723, df = 37.979, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -10.387307 -9.379903
#> sample estimates:
#> mean in group 1 mean in group 2
#> 249.9325 259.8161
#>
#>
#> [[2]]
#>
#> Welch Two Sample t-test
#>
#> data: x by gp
#> t = -39.95, df = 32.403, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -10.636565 -9.605016
#> sample estimates:
#> mean in group 1 mean in group 2
#> 249.8622 259.9830
#>
#>
#> [[3]]
#>
#> Welch Two Sample t-test
#>
#> data: x by gp
#> t = -38.852, df = 30.97, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -10.496965 -9.449833
#> sample estimates:
#> mean in group 1 mean in group 2
#> 249.9768 259.9502
#>
#>
#> [[4]]
#>
#> Welch Two Sample t-test
#>
#> data: x by gp
#> t = -36.184, df = 37.634, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -10.140577 -9.065691
#> sample estimates:
#> mean in group 1 mean in group 2
#> 250.1526 259.7558
#>
#>
#> [[5]]
#>
#> Welch Two Sample t-test
#>
#> data: x by gp
#> t = -36.379, df = 37.399, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -9.570911 -8.561376
#> sample estimates:
#> mean in group 1 mean in group 2
#> 250.5136 259.5797
#>
#>
#> [[6]]
#>
#> Welch Two Sample t-test
#>
#> data: x by gp
#> t = -34.786, df = 37.945, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -10.246793 -9.119708
#> sample estimates:
#> mean in group 1 mean in group 2
#> 250.2552 259.9385
#>
#>
#> [[7]]
#>
#> Welch Two Sample t-test
#>
#> data: x by gp
#> t = -45.541, df = 33.333, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -10.154724 -9.286519
#> sample estimates:
#> mean in group 1 mean in group 2
#> 250.0092 259.7298
#>
#>
#> [[8]]
#>
#> Welch Two Sample t-test
#>
#> data: x by gp
#> t = -48.884, df = 33.506, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -10.502344 -9.663542
#> sample estimates:
#> mean in group 1 mean in group 2
#> 249.7921 259.8750
#>
#>
#> [[9]]
#>
#> Welch Two Sample t-test
#>
#> data: x by gp
#> t = -35.212, df = 30.62, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -10.171712 -9.057381
#> sample estimates:
#> mean in group 1 mean in group 2
#> 250.2398 259.8543
#>
#>
#> [[10]]
#>
#> Welch Two Sample t-test
#>
#> data: x by gp
#> t = -36.841, df = 33.874, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -10.197028 -9.130719
#> sample estimates:
#> mean in group 1 mean in group 2
#> 250.0272 259.6911
#>
#>
#> [ reached getOption("max.print") -- omitted 90 entries ]
results is a list of t.test objects that has as many list-entries as number of bootstrap replicates. results can be inspected in whatever way you desire.
rsample performs bootstrapping the way you'd expect (i.e. sampling with replacement). Yes, each bootstrap has as many rows as the number of rows in the original data.
For a given resample, some rows in the original data will be included multiple times while some rows will not be included at all. So each split in boots$splits has two parts, the analysis part, which is the standard bootstrap, and also the assessment part, which is the rows that did not get included.
So pulling out the analysis part of a bootstrap split will be a data frame with as many rows as the original data, created by sampling with replacement.
analysis(boots$splits[[1]])
The analysis parts will always be the same size (nrow(original_data)), while the assessment part will vary in size for the various bootstrap replicates.