I'm trying to make a systematic sample from a dataframe thats generated by several loops I'm running. Meaning the number of rows/observations will vary from each iteration. The dataframe consists of four variables/columns:
"ID" "Width" "Length" "Rank"

The "Rank" ranges from 1 to N. This variable is supposed to represent the order of my observations and is what I want to samply by. The goal is to select every n'th observation according to rank. The sample size is gonna vary by the "for" loop im running (samplesize=4,6,8,10).

Example: If the current dataframe consists of 20 observations and Im in the samplesize=4 part of the "for" loop. "Rank" will then be 1,2,3,4,(...),20. The n'th selection will be 20/4=5 = every 5th observation.
Id then like to make a new dataframe with every 5th observation (according to "Rank").

Any ideas on how to set something like that up? This might be really easy but Im a bit stuck.
Thanks for any help you might provide!

OK, if you want to take a random sample of n objects from a population size of N using a systematic sample, you'll need to 1) calculate the sampling interval and then 2) choose a random starting point. This is done in the function created below and then applied to a data example.

set.seed(12345) # we will all get the same random samples by setting a seed
get_sys_indicator <- function(N,n){
k = ceiling(N/n) # sampling interval
r = sample(1:k, 1) # random starting point
seq(r, r + k*(n-1), k) # this gives you a vector of the indices to select
}
mydat <- data.frame(
ID=letters[1:20],
Width=rlnorm(20),
Length=rlnorm(20),
Rank=1:20
)
head(mydat)
#> ID Width Length Rank
#> 1 a 1.7959405 2.1806477 1
#> 2 b 2.0329054 4.2878485 2
#> 3 c 0.8964585 0.5250150 3
#> 4 d 0.6354021 0.2115831 4
#> 5 e 1.8328781 0.2023595 5
#> 6 f 0.1623573 6.0805644 6
(sampleindex <- get_sys_indicator(20, 4))
#> [1] 3 8 13 18
mysample <- mydat[mydat$Rank %in% sampleindex, ]
mysample
#> ID Width Length Rank
#> 3 c 0.8964585 0.5250150 3
#> 8 h 0.7586732 1.8596342 8
#> 13 m 1.4486439 7.7616143 13
#> 18 r 0.7177905 0.1897495 18

^{Created on 2022-03-06 by the reprex package (v2.0.1)}