(new to r) How can i automate the calculation of finding the relative frequency of values in a vector that is greater than a sequence of values?

Cova · November 12, 2019, 12:23pm

Hi

In advance, I'm apologize for being a complete newbie at r

I have a dataframe from a MonteCarlo simulation, 1million rows and 8 columns (values range from 0 to 100million+)

I'm trying to find a way in which i can get the relative frequency of values in each vector that is greater than certain values.

With the function below I can specify which column and what value so that I i.e. can get the relative frequency of how many observations in "column1" have a value greater than 10,000, which is 10,6%

Relative.percent <- function(x, n){ 100*length((which(x > n))) / length(x) }
I.e.
Relative.percent(Results$Event1,10000)
10.6

However this would take me weeks to write out all the different values i want,
as i I'm trying to get the relative frequency of all values greater than 1, 10,000, 20,000, 30,000 ... 100,000,000 so that i can get a detailed graph which would probably look similar to an s curve.

My ideal output would be:

Value__ Event1__ Event2__ Event3 ...
1_______ 0.40____ 0.36____ 0.76
10,000 __ 0.38____ 0.32____ 0.55
20,000 __ 0.27 ____0.19 ____ 0.48
...

Each Event will show the relative frequency of values that are over 1, or over 10,000 or over 20,000 etc

In advance, thank you for your help!

stkrog · November 12, 2019, 1:37pm

Maybe this will help? https://stat.ethz.ch/pipermail/r-help/2012-July/319703.html

Yarnabrina · November 12, 2019, 2:22pm

Does this help?

set.seed(seed = 44552)

no_rows <- 20
no_columns <-5

mean_sim <- 50
sd_sim <- 10

fake_data <- replicate(n = no_columns,
                       expr = rnorm(n = no_rows,
                                    mean = mean_sim,
                                    sd = sd_sim))
colnames(x = fake_data) <- LETTERS[seq_len(length.out = no_columns)]

values_sim <- seq(from = 30,
                  to = 70,
                  by = 5)

get_relative_frequency <- function(column_name, value)
{
    100 * mean(x = (fake_data[, column_name] > value))
}

outer(X = colnames(x = fake_data),
      Y = values_sim,
      FUN = Vectorize(FUN = get_relative_frequency))
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#> [1,]  100   90   70   60   40   35   20    5    5
#> [2,]  100  100  100   85   70   20   15   10    5
#> [3,]   95   95   85   70   45   20   20    0    0
#> [4,]  100   90   90   70   45   15   10    5    5
#> [5,]  100  100  100   75   60   35   20   10    0

^{Created on 2019-11-12 by the reprex package (v0.3.0)}

Cova · November 15, 2019, 12:49pm

Thank you, it helped a lot

Sorry for the late reply!

system · November 22, 2019, 12:55pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.