Mean function select rows AND define conditions

heyloism · May 18, 2020, 8:57am

Hi community,

my question is concerning the mean function.

So, if I have a data set, e.g.

data <- data.table(c(1,2,3,4,5,6,7,8,9,10),c(1,6,4,78,2,14,2,95,2,11))

and I want to have the mean of column two, rows five to nine, I write

Average <- mean( data$V2[5:9] )

In a different case, when I want the mean value of all entries in the second column which are greater than two, I go

Average <- mean( data$V2[ data$V2 > 2 ] )

Now, I want both. But how would I get the mean value of rows five to nine of column two where the entries are greater than two, basically combining both of the before mentioned code strips?

In your answer, please consider that I am a total newbie, so maybe dumb it down a little. Thanks in advance!

nirgrahamuk · May 18, 2020, 9:07am


library(data.table)
data <- data.table(c(1,2,3,4,5,6,7,8,9,10),c(1,6,4,78,2,14,2,95,2,11))
mean( data$V2[5:9] )
mean( data$V2[ data$V2 > 2 ] )
mean( data$V2[5:9][data$V2[5:9]>2] )

library(tidyverse)
mean(slice(data,5:9) %>% pull(V2),na.rm = TRUE)
mean(filter(data,V2 > 2) %>% pull(V2),na.rm = TRUE)
mean(slice(data,5:9) %>% 
       filter(V2 > 2) %>% 
       pull(V2),na.rm = TRUE)

martin.R · May 18, 2020, 9:38am

The proper data.table syntax for these three cases is:

data[5:9, mean(V2)]
data[V2 > 2, mean(V2)]
data[5:9][V2 > 2, mean(V2)]

heyloism · May 18, 2020, 11:22am

Thank you nigrahamuk and martin for your fast replies. The solutions worked nicely for me.

I got a followup question, though. I want to get the mean, as described:

data[5:9][V2 != 2, mean(V2)]

Yet I don't want to exclude ALL the entries, that are equal to two, but only one. So e.g. if I have a vector like

c(1,2,3,2,2,4)

I want the mean of

c(1,3,2,2,4)

Is there an easy way to do this?

nirgrahamuk · May 18, 2020, 11:37am

what you've asked for is ambigious.
are you always dropping at least one 2 , in which case if the input had only one 2, you would drop it.
or are you dropping any 2 more than a first 2 which is allowed , so that if you had only one 2 you would observe it ?

heyloism · May 18, 2020, 12:23pm

If there is a 2, i want to drop it. But if there is more than one 2, i only want to drop one of them. Since it doesnt affect the mean value, i dont care if the first, last, or middle 2 is dropped.

The idea behind this is, I intend to do something like an olympic smooth, where for every datapoint I take the average of say the previous 6 datapoints, but not including one min value. But if the min value occurs more than one time, I don't want every entry to be dropped.

So for the vector

c(5,2,3,2,2,4,2)

I want either

mean(c(5,3,2,2,4,2))

or

mean(c(5,2,3,2,4,2))

or

mean(c(5,2,3,2,2,4))

but never

mean(c(5,3,4)

Or in pseudo code for my previous example

data[5:9][ "Drop exactly one entry that is 2", mean(V2)]

martin.R · May 18, 2020, 2:07pm

data[5:9, ifelse(sum(V2 == 2) > 0, (sum(V2) - 2) / (.N - 1), mean(V2))]

system · May 25, 2020, 2:07pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.