Dear community,
I am misunderstanding something or approaching my problem from a wrong perspective. I need your help to point me in the proper direction. Both in term of syntax and performance optimisation.
Here is my problem: I need to calculate and derived several values from aggregates like mean, sd.
First approach was to pseudo-code as:
data %>% group_by(g) %>% summarise(fl1 = mean(x), fl2 = mean(x) / sd(x) )
Issue is that mean was recalculated in each fields. Not very efficient with thousands of groups and a dozen of reference to mean, sd, min, max, etc.
Second approach was to create a function taking the vector of x, calculating all my fields, returning a one row data.frame. The pseudo code becomes:
data %>% group_by(g) %>% summarise(fl = list(f(x))) %>% unnest(c(fl))
Here the code appear to be slow in term of performance. Also I am not comfortable with the syntax. It looks a bit off so I am not sure it is the proper and elegant way to do it.
For the third approach, I tried to return a one level list from my function. I was not able to unlist it properly so that elements are transformed into columns / fields.
So how would you recommend as a proper approach using the tidyverse syntax?
Would you recommend to use more standard function like apply or other package like purr to handled such problem?
Thanks in advance and best regards,
jm