R loop and sampling

Fin · March 28, 2022, 11:58am

Hi I am pretty new to R but having a bit of trouble getting my head round this. I am trying to create a sample set of data based of an existing set I have. currently I am getting probabilities of existing values and then using sample() function and the probabilities and unique values in the dataset. I want to create a for loop that will loop through the the different columns run my code to sample the data and then produce a new dataframe with the generated data. Can't workout how to make the loop run for each column and then create the sample data into a new dataframe. Also apologies if my code is a bit messy i'm new!

for (i in colnames(ED_test3)) {  
 sample( 
    c(sort(unique(ED_test2[[i]]))), 
    n,replace = TRUE, 
    prob = (prop.table(table(ED_test2[[i]]))))  
}

pieterjanvc · March 28, 2022, 12:48pm

Hi,

Welcome to the RStudio community!

Here is a way of doing that using an the lapply function. It will iterate over every column in your data and output a list, which can then be pasted back together into a data frame

set.seed(3) #Only needed for reproducibility 

#Test data
myData = data.frame(x = sample(1:3, 5, replace = T),
           y = sample(LETTERS[1:3], 5, replace = T))
myData
#>   x y
#> 1 1 C
#> 2 2 B
#> 3 3 C
#> 4 2 A
#> 5 3 B

#Create new data
nNew = 10
newData = lapply(myData, function(x){
  
  #Get the freq of each value
  freq = table(x)
  
  #Sample new values
  sample(sort(unique(x)), nNew, prob = freq / sum(freq), replace = T)
  
})

#Create data frame from each sampling
newData = as.data.frame(newData)
newData
#>    x y
#> 1  2 C
#> 2  2 B
#> 3  2 B
#> 4  2 B
#> 5  2 C
#> 6  2 B
#> 7  3 B
#> 8  3 A
#> 9  1 B
#> 10 3 C

^{Created on 2022-03-28 by the reprex package (v2.0.1)}

Note that sample() is useful when you have discrete values, but for continuous values you should use other functions.

Hope this helps,
PJ

system · April 4, 2022, 12:48pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.