Random sampling (simulation) of a dataframe optimized on the mean of one or more variables (each variable is a column data of my dataframe)

Hi there,

Could you please help me to choose the right method/function/...to solve this problem?

I have a data.frame of (size: 24087 X 5). Please see the table below.

I need to find all the possible combinations of "Barrel" with only 14976 barrels (each time) where the mean of Z2 values (mean of Z2 over 14976 rows) is 1 or very close to 1. Actually, I want to know how can I do a optimized simulation to get a pre-defined condition. I used lapply and sampling (please see the code below) but it's difficult to define "prob". And I don't know if I can trust this method.

As for the distribution of Z2- values please see the histogram. 20% of my Z2-values have the value of zero.

In my simulation, I would prefer to have as much as possible higher Z2-values in each 14976 combinations (if possible: meaning if I can get the close to 1).

LO<- lapply(1:2000, function(i){sample(Z,14976,replace=TRUE, prob=1/(Z+0.25)+(0.036*Z))})
MEANS=unlist(lapply(LO, mean))
hist(MEANS)

Z=Z2

summary(Z)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.010 0.060 1.854 0.470 108.130

Histogram

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.