Mean function select rows AND define conditions

Hi community,

my question is concerning the mean function.

So, if I have a data set, e.g.

data <- data.table(c(1,2,3,4,5,6,7,8,9,10),c(1,6,4,78,2,14,2,95,2,11))

and I want to have the mean of column two, rows five to nine, I write

Average <- mean( data$V2[5:9] )

In a different case, when I want the mean value of all entries in the second column which are greater than two, I go

Average <- mean( data$V2[ data$V2 > 2 ] )

Now, I want both. But how would I get the mean value of rows five to nine of column two where the entries are greater than two, basically combining both of the before mentioned code strips?

In your answer, please consider that I am a total newbie, so maybe dumb it down a little. Thanks in advance!


library(data.table)
data <- data.table(c(1,2,3,4,5,6,7,8,9,10),c(1,6,4,78,2,14,2,95,2,11))
mean( data$V2[5:9] )
mean( data$V2[ data$V2 > 2 ] )
mean( data$V2[5:9][data$V2[5:9]>2] )

library(tidyverse)
mean(slice(data,5:9) %>% pull(V2),na.rm = TRUE)
mean(filter(data,V2 > 2) %>% pull(V2),na.rm = TRUE)
mean(slice(data,5:9) %>% 
       filter(V2 > 2) %>% 
       pull(V2),na.rm = TRUE)
1 Like

The proper data.table syntax for these three cases is:

data[5:9, mean(V2)]
data[V2 > 2, mean(V2)]
data[5:9][V2 > 2, mean(V2)]
3 Likes

Thank you nigrahamuk and martin for your fast replies. The solutions worked nicely for me.

I got a followup question, though. I want to get the mean, as described:

data[5:9][V2 != 2, mean(V2)]

Yet I don't want to exclude ALL the entries, that are equal to two, but only one. So e.g. if I have a vector like

c(1,2,3,2,2,4)

I want the mean of

c(1,3,2,2,4)

Is there an easy way to do this?

what you've asked for is ambigious.
are you always dropping at least one 2 , in which case if the input had only one 2, you would drop it.
or are you dropping any 2 more than a first 2 which is allowed , so that if you had only one 2 you would observe it ?

If there is a 2, i want to drop it. But if there is more than one 2, i only want to drop one of them. Since it doesnt affect the mean value, i dont care if the first, last, or middle 2 is dropped.

The idea behind this is, I intend to do something like an olympic smooth, where for every datapoint I take the average of say the previous 6 datapoints, but not including one min value. But if the min value occurs more than one time, I don't want every entry to be dropped.

So for the vector

c(5,2,3,2,2,4,2)

I want either

mean(c(5,3,2,2,4,2))

or

mean(c(5,2,3,2,4,2))

or

mean(c(5,2,3,2,2,4))

but never

mean(c(5,3,4)

Or in pseudo code for my previous example

data[5:9][ "Drop exactly one entry that is 2", mean(V2)]
data[5:9, ifelse(sum(V2 == 2) > 0, (sum(V2) - 2) / (.N - 1), mean(V2))]
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.