R loop and sampling

Hi I am pretty new to R but having a bit of trouble getting my head round this. I am trying to create a sample set of data based of an existing set I have. currently I am getting probabilities of existing values and then using sample() function and the probabilities and unique values in the dataset. I want to create a for loop that will loop through the the different columns run my code to sample the data and then produce a new dataframe with the generated data. Can't workout how to make the loop run for each column and then create the sample data into a new dataframe. Also apologies if my code is a bit messy i'm new!

for (i in colnames(ED_test3)) {  
    n,replace = TRUE, 
    prob = (prop.table(table(ED_test2[[i]]))))  


Welcome to the RStudio community!

Here is a way of doing that using an the lapply function. It will iterate over every column in your data and output a list, which can then be pasted back together into a data frame

set.seed(3) #Only needed for reproducibility 

#Test data
myData = data.frame(x = sample(1:3, 5, replace = T),
           y = sample(LETTERS[1:3], 5, replace = T))
#>   x y
#> 1 1 C
#> 2 2 B
#> 3 3 C
#> 4 2 A
#> 5 3 B

#Create new data
nNew = 10
newData = lapply(myData, function(x){
  #Get the freq of each value
  freq = table(x)
  #Sample new values
  sample(sort(unique(x)), nNew, prob = freq / sum(freq), replace = T)

#Create data frame from each sampling
newData = as.data.frame(newData)
#>    x y
#> 1  2 C
#> 2  2 B
#> 3  2 B
#> 4  2 B
#> 5  2 C
#> 6  2 B
#> 7  3 B
#> 8  3 A
#> 9  1 B
#> 10 3 C

Created on 2022-03-28 by the reprex package (v2.0.1)

Note that sample() is useful when you have discrete values, but for continuous values you should use other functions.

Hope this helps,

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.