(new to r) How can i automate the calculation of finding the relative frequency of values in a vector that is greater than a sequence of values?

Hi

In advance, I'm apologize for being a complete newbie at r

I have a dataframe from a MonteCarlo simulation, 1million rows and 8 columns (values range from 0 to 100million+)

I'm trying to find a way in which i can get the relative frequency of values in each vector that is greater than certain values.

With the function below I can specify which column and what value so that I i.e. can get the relative frequency of how many observations in "column1" have a value greater than 10,000, which is 10,6%

Relative.percent <- function(x, n){ 100*length((which(x > n))) / length(x) }
I.e.
Relative.percent(Results$Event1,10000)
10.6

However this would take me weeks to write out all the different values i want,
as i I'm trying to get the relative frequency of all values greater than 1, 10,000, 20,000, 30,000 ... 100,000,000 so that i can get a detailed graph which would probably look similar to an s curve.

My ideal output would be:

Value__ Event1__ Event2__ Event3 ...
1_______ 0.40____ 0.36____ 0.76
10,000 __ 0.38____ 0.32____ 0.55
20,000 __ 0.27 ____0.19 ____ 0.48
...

Each Event will show the relative frequency of values that are over 1, or over 10,000 or over 20,000 etc

In advance, thank you for your help!

Maybe this will help? https://stat.ethz.ch/pipermail/r-help/2012-July/319703.html

Does this help?

set.seed(seed = 44552)

no_rows <- 20
no_columns <-5

mean_sim <- 50
sd_sim <- 10

fake_data <- replicate(n = no_columns,
                       expr = rnorm(n = no_rows,
                                    mean = mean_sim,
                                    sd = sd_sim))
colnames(x = fake_data) <- LETTERS[seq_len(length.out = no_columns)]

values_sim <- seq(from = 30,
                  to = 70,
                  by = 5)

get_relative_frequency <- function(column_name, value)
{
    100 * mean(x = (fake_data[, column_name] > value))
}

outer(X = colnames(x = fake_data),
      Y = values_sim,
      FUN = Vectorize(FUN = get_relative_frequency))
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#> [1,]  100   90   70   60   40   35   20    5    5
#> [2,]  100  100  100   85   70   20   15   10    5
#> [3,]   95   95   85   70   45   20   20    0    0
#> [4,]  100   90   90   70   45   15   10    5    5
#> [5,]  100  100  100   75   60   35   20   10    0

Created on 2019-11-12 by the reprex package (v0.3.0)

Thank you, it helped a lot :slight_smile:

Sorry for the late reply!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.