Greetings all!
Overall, I am trying to model the accuracy of several sampling methods of 20 observations against a larger dataset of 100 observations that provides the sample mean and best estimate of the population parameter. To start, I would like to take 10,000 iterations randomly selecting 10 clusters of 2 consecutive observations (totaling a 20 observation sample) with replacement from the dataset of 100 observations. It is important that the sampling wraps as well, so if the 100th observation is randomly selected, it would be wrap to pair with the 1st observation. Being able to sample consecutive observations is important, as there is a geospatial component to the ordering of the data (i.e. two consecutive rows are physically closest to one another).
I have hit a wall trying to achieve the necessary random cluster sampling as described above. I have looked at infer
, sample
, rsample
, boot
, and resample
packages, as well as others, and thoroughly looked through posts on Stack Overflow, but can't seem to find a solution that applies or pieces of the solution that I can conceptualize into the answer.
The closest I can get for just the sampling portion using simple random sampling:
For rows:
dat <- data.frame(hh_id = c(1:100), var = sample(1:200, 100, replace = T))
rs <- NULL
for(i in 1:10000){rs[i] = list(dat[sample(nrow(dat), 20, replace=TRUE),])}
glimpse(rs)
For observations within a single variable:
dat <- data.frame(hh_id = c(1:100), cont = sample(1:200, 100, replace = T))
rs <- NULL
for(i in 1:10000){rs[i] = mean(sample(dat$cont, 20, replace=T), na.rm=T)}
glimpse(rs)
OR with infer
dat %>%
specify(response = cont) %>%
generate(reps = 10000, type = "bootstrap") %>%
calculate(stat = "mean")
I also came across this article, but can't seem to apply it to random sampling and suspect it is a bit dated with lapply instead of tidyverse options.
I would be sincerely grateful for any thoughts or guidance on how to achieve bootstrapping of 10,000 iterations randomly selecting 10 clusters of 2 consecutive observations (totaling a 20 observation sample) with replacement from the dataset of 100 observations. Even the simple step of extracting that resample dataset would be terribly helpful. I'm really stumped. Thanks in advance!