# two ways of calculating the average and two different results ...

Hello,

I have a methylation table with patients in rows and genes in columns.
I want to calculate the average methylation of each gene for each group:

tapply(dfX\$gene1, dfX\$status, mean)
group1 group2
0.06449247 0.06124757

in the second method, I use aggregate to calculate mean for all genes:

resume <- aggregate(data=dfX, .~status, mean)
resume\$gene1
 0.05448438 0.05161707 for group 1 and 2 respectively

As you can see, these are two different average results !!

I check and I found that the first method give the right result !
Could you explain me why the function aggregate don't give the good results or if I made a mistake with this function

Alex

I'm having a tough time reproducing your issue. Can you maybe post a sample of your data and we can try with that?

``````df <- data.frame(
'a' = c(runif(500), runif(500, min = 1, max = 2)),
'status' = c(rep(0, 500), rep(1, 500))
)

tapply(
df\$a,
df\$status,
mean
)
#>        0        1
#> 0.508250 1.496064

aggregate(
data = df,
. ~ status,
mean
)
#>   status        a
#> 1      0 0.508250
#> 2      1 1.496064
``````

Created on 2022-07-11 by the reprex package (v1.0.0)

1 Like

I remember something about aggregate() when there are NAs, like dropping an entire row if any of the values in the row are NA. Of course, at 66 my memory is not perfect. What happens when you do

``````aggregate(data = dfX, gene1 ~ status, mean)
``````

instead of including all variables with `. ~ status` and then selecting gene1?

Hi, EconProf,
this is the result:

`````` status       Gene1
``````

1 KRT19high 0.06449247
2 KRT19low 0.06124757

same as tapply

Hi dvetsch75,

here is the file.
I reduced the table to 50 variables. It's funny because the result with aggregate is different than with the complete table and always different with tapply...

tapply(df2\$gene1, df2\$status, mean)
group1 group2
0.06449247 0.06124757
df3 <- aggregate(data=df2, .~status, mean)
df3\$gene1
 0.06200115 0.05979080

Update:

I use

na.rm=TRUE, na.action=NULL

as argument to aggregate function and found the same results as tapply