Random sampling (simulation) of a dataframe optimized on the mean of one or more variables (each variable is a column data of my dataframe)

Hi there,

Could you please help me to choose the right method/function/...to solve this problem?

I have a data.frame of (size: 24087 X 5). Please see the table below.

I need to find all the possible combinations of "Barrel" with only 14976 barrels (each time) where the mean of Z2 values (mean of Z2 over 14976 rows) is 1 or very close to 1. Actually, I want to know how can I do a optimized simulation to get a pre-defined condition. I used lapply and sampling (please see the code below) but it's difficult to define "prob". And I don't know if I can trust this method.

As for the distribution of Z2- values please see the histogram. 20% of my Z2-values have the value of zero.

In my simulation, I would prefer to have as much as possible higher Z2-values in each 14976 combinations (if possible: meaning if I can get the close to 1).

LO<- lapply(1:2000, function(i){sample(Z,14976,replace=TRUE, prob=1/(Z+0.25)+(0.036*Z))})
MEANS=unlist(lapply(LO, mean))


Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.010 0.060 1.854 0.470 108.130


This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.