Hi I am pretty new to R but having a bit of trouble getting my head round this. I am trying to create a sample set of data based of an existing set I have. currently I am getting probabilities of existing values and then using sample() function and the probabilities and unique values in the dataset. I want to create a for loop that will loop through the the different columns run my code to sample the data and then produce a new dataframe with the generated data. Can't workout how to make the loop run for each column and then create the sample data into a new dataframe. Also apologies if my code is a bit messy i'm new!
for (i in colnames(ED_test3)) {
sample(
c(sort(unique(ED_test2[[i]]))),
n,replace = TRUE,
prob = (prop.table(table(ED_test2[[i]]))))
}
Here is a way of doing that using an the lapply function. It will iterate over every column in your data and output a list, which can then be pasted back together into a data frame
set.seed(3) #Only needed for reproducibility
#Test data
myData = data.frame(x = sample(1:3, 5, replace = T),
y = sample(LETTERS[1:3], 5, replace = T))
myData
#> x y
#> 1 1 C
#> 2 2 B
#> 3 3 C
#> 4 2 A
#> 5 3 B
#Create new data
nNew = 10
newData = lapply(myData, function(x){
#Get the freq of each value
freq = table(x)
#Sample new values
sample(sort(unique(x)), nNew, prob = freq / sum(freq), replace = T)
})
#Create data frame from each sampling
newData = as.data.frame(newData)
newData
#> x y
#> 1 2 C
#> 2 2 B
#> 3 2 B
#> 4 2 B
#> 5 2 C
#> 6 2 B
#> 7 3 B
#> 8 3 A
#> 9 1 B
#> 10 3 C