How to calculate the average from triplicate results

Hi! I'm very new to R and can't find any information. Can anyone please help me figure out how to get averages? I have a sensory evaluation data frame where 12 trained assessors evaluated 16 pea products for 13 sensory attributes in 3 replicates. How do I calculate the average for each assessor for each product (i.e. average of 3 replicates done by assessor), and also for each product (i.e. average of all assessors for each product)? I've tried using the SensoMineR package with the following code, but it doesn't want to run. It says "code in this chunk is scheduled to run later, when other chunks have finished executing".

data(peaimpba) %>%
resaverage<-averagetable(peaimpba, formul = "PRODUCT+ASSESSOR",
    firstvar = 5) %>%
coltable(magicsort(resaverage), level.upper = 6,level.lower = 4,
    main.title = "Average by PRODUCT")

Is there an easier way to get averages not using this package?

Thanks!

If peaimpba is a data frame, a common way to find the averave for each combination of Product and assessor would be to use the mean, group_by, and summarize functions, the latter two being from the dplyr package. If you can post some data and explain the variable names needed for the calculation, someone can post specific code. The output of the dput function is very helpful for providing convenient data.

dput(head,peaimpba, 20))

Thanks for your comment!

Hoping this image shows, it's easier to show all variables and replicates in the excel file than in RStudio. So I would want the averages for PEA_OD, POD_OD ... SKIN_VIS.

Since you posted an image of your data and there are more columns than I care to type in by hand, I invented a small data set that has the essential features of your data.
I calculate the averages of columns A, B and C for each combination of Product and ASSESSOR. I use the group_by() function to set the calculation to be for each combination of Product and ASSESSOR. I use the summarize() and across() functions to calculate a mean for each column starting at column A and ending at column C.

library(dplyr)
#Invent data
DF <- data.frame(Product=rep(c(100,200,300),each=9),
                 A=rnorm(27),
                 B=rnorm(27,mean = 1),
                 C=rnorm(27,mean = 3),
                 ASSESSOR=rep(1:3,9))
head(DF)
#>   Product          A         B        C ASSESSOR
#> 1     100 -0.8343728 0.5384004 2.525180        1
#> 2     100 -1.1089836 0.3477684 1.176674        2
#> 3     100  0.1277367 1.5501076 4.876563        3
#> 4     100 -0.5453276 0.6989939 2.576447        1
#> 5     100 -0.1482853 0.3185497 2.831341        2
#> 6     100  0.2403161 2.3938772 2.579793        3
#Summarize by product and assessor
AvgProdAssess <- DF |> group_by(Product, ASSESSOR) |> 
  summarize(across(.cols = A:C,.fns = mean))
#> `summarise()` has grouped output by 'Product'. You can override using the `.groups` argument.

AvgProdAssess
#> # A tibble: 9 x 5
#> # Groups:   Product [3]
#>   Product ASSESSOR       A     B     C
#>     <dbl>    <int>   <dbl> <dbl> <dbl>
#> 1     100        1 -0.542  1.30   2.20
#> 2     100        2 -0.364  0.839  2.83
#> 3     100        3  0.226  1.48   3.92
#> 4     200        1  0.478  0.596  2.97
#> 5     200        2  0.444  1.15   4.60
#> 6     200        3  0.0114 1.91   4.17
#> 7     300        1  0.403  0.966  3.94
#> 8     300        2  0.274  0.918  3.17
#> 9     300        3  0.0864 1.27   3.27

Created on 2021-11-14 by the reprex package (v2.0.1)

Thank you so much for trying to help, but I still don't understand how to use this for my data. I'm sorry, I also don't know how to share my data frame other than a screenshot of it. Would you be able to explain it to me so that I can understand how to apply it to my data?

Would I put my data frame peaimpba where you have "data.frame"? What is the each=9 for? Where does the 27 come from? Why do B and C have mean = 1 and mean = 3, respectively? And why doesn't A have something similar? What is the 9 for in the line that has ASSESSOR?

The part of the code that you are asking about is not relevant to your problem. That is just me inventing some data to work with. The part of my code that directly addresses your problem is the line after
#Summarize by product and assessor
I think what will work for you is

library(dplyr)
AvgProdAssess <- peaimpba |> group_by(PRODUCT, ASSESSOR) |>
       summarize(across(.cols = PEA_OD:SKIN_VIS, .fns = mean))

That will give you, I hope, a data frame named AvgProdAssess with the results you want. I cannot read the complete column names in your image, so I guessed that PRODUCT and ASSESSOR are correct.

To post data, run

dput(head(peaimpba, 20))

and paste the output of that into a reply on this thread.

It works!! Thank you so very much for your help! The output is as follows:

structure(list(DATE = c(12451, 12451, 12451, 12451, 12451, 12451,
12451, 12451, 12451, 12451, 12451, 12451, 12457, 12457, 12457,
12457, 12457, 12457, 12457, 12457), ORDER = c(3, 1, 2, 3, 1,
3, 2, 1, 2, 4, 4, 4, 2, 1, 2, 3, 1, 2, 4, 1), PRODUCT = c(132,
132, 132, 132, 132, 132, 132, 132, 132, 132, 132, 132, 132, 132,
132, 132, 132, 132, 132, 132), PREVIOUS = c(701, 0, 701, 494,
0, 930, 494, 0, 930, 930, 494, 701, 231, 0, 569, 237, 0, 237,
231, 0), PEA_OD = c(8.5, 4.7, 7.1, 6.7, 8.3, 8.4, 2.1, 8.5, 5.21419360177377,
5.57731725951917, 7.20231725951917, 6.9, 6, 4.2, 7, 8.2, 7.3,
2.2, 5.30100604717383, 6.4), POD_OD = c(3, 7.8, 6.2, 3.4, 3.8,
9.4, 0.8, 8.6, 4.57701898460361, 5.26945040793965, 5.26945040793965,
5.3, 1.1, 10.4, 3, 4.9, 2.5, 4, 2.29303167439163, 8.1), SWEET_OD = c(2.7,
2.5, 4, 9.4, 9.3, 8.6, 0.2, 7.8, 5.47264729215009, 7.30510372614687,
6.30822872614687, 5.2, 3.7, 1.8, 1.8, 7.2, 8, 7.2, 3.84055354006671,
8.9), EARTH_OD = c(1.9, 1.6, 2.6, 1.3, 1.9, 1.5, 0, 2.7, 5.7866127989587,
4.19052270909193, 3.76864770909193, 2.1, 1, 2, 3.4, 1.8, 0, 9.5,
1.58571670252911, 2.8), PEA_TA = c(9, 3.6, 8.5, 7.7, 5.9, 6,
3, 7.7, 7.54816857744912, 7.20796304374728, 6.63921304374728,
7.5, 4.1, 3.5, 7.2, 4.7, 6.1, 10.3, 5.10867287368889, 7.6), POD_TA = c(4.3,
5.3, 6.1, 2.3, 2.6, 7.9, 0.3, 8.8, 6.79513093017187, 8.048787860617,
6.276912860617, 5, 6, 6, 2.5, 2.6, 2.9, 2.6, 2.08184102467404,
9.9), SWEET_TA = c(7, 2.9, 4.4, 9.2, 10.3, 8.8, 3.7, 6, 8.2546213168602,
6.66633455383992, 5.81633455383992, 8.8, 1.8, 1.5, 7.3, 6.9,
4.6, 10.6, 5.83262566182334, 9.9), EARTH_TA = c(0.9, 0.5, 3,
1.2, 0.5, 1.4, 0, 1.73922043934608, 1.59716541970052, 2.53624849458696,
2.32687349458696, 1.8, 3.6, 2.4, 1.9, 1.6, 0, 2, 0.383789890957604,
0.5), CRISP = c(2.3, 2.8, 8.7, 8.4, 7.2, 3.8, 7.8, 7.9, 6.88348124370722,
6.08290281524388, 4.58290281524388, 6.7, 8.8, 3.2, 9, 5.4, 4.7,
8.9, 6.27590214931074, 6.7), JUICY = c(5, 1.2, 6.7, 7.2, 5, 9.3,
3.2, 6.1, 5.6265857269369, 4.68080246765387, 3.92767746765387,
6.3, 3.8, 1.3, 7.7, 3.8, 2.7, 4.1, 3.63176231961318, 7.3), HARD = c(5.4,
7, 5.4, 6.7, 10.7, 7.3, 6.9, 5.3, 8.04795682485949, 7.02824617285958,
6.69699617285958, 3.9, 7.9, 7.3, 6, 9.7, 10, 7.6, 7.61801879523133,
7.3), MEALY = c(8.5, 5, 9.9, 5.2, 9, 9.8, 9.7, 8.9, 10.0226119740377,
8.73032980363849, 8.20220480363849, 4.7, 11.6, 8.5, 11, 8.1,
7.9, 7.9, 8.19702118688969, 9), SKIN_VIS = c(7.9, 10, 10.1, 8.1,
8.6, 9, 10.8, 6.8, 9.75194739618463, 7.46777524965843, 9.97090024965843,
4.7, 8.8, 7.5, 7.5, 10.2, 9.6, 6.4, 9.30722390092357, 9), REPLICAT = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2), SIZE = c(4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), COLOUR = c(2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), SUCROSE = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), VARIETY = c("Novella",
"Novella", "Novella", "Novella", "Novella", "Novella", "Novella",
"Novella", "Novella", "Novella", "Novella", "Novella", "Novella",
"Novella", "Novella", "Novella", "Novella", "Novella", "Novella",
"Novella"), ASSESSOR = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
1, 2, 3, 4, 5, 6, 7, 8), I = c(14, 14, 14, 14, 14, 14, 14, 14,
14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))

If I open the AvgProdAssess data frame it show the table with the averages :slight_smile:

Hi FJCC,
Whilst reading this post I was wondering if it would be possible to just calculate sum or mean for A:C columns
of DF and place it in last row "as totals" like in Excel ?
kind regards,
A

@Andrzej
Yes it is possible to do that but I only do it for display purposes in a report. Having a Total or Average row interferes with calculations. Here are two versions of appending an Average row to the original data frame. Notice that I removed the group_by() function, so the average is computed for the entire data set.

#Summarize 
AvgProdAssess <- DF |> 
   summarize(across(.cols = A:C,.fns = mean))
AvgProdAssess
           A        B        C
1 -0.0516184 1.295057 2.937245
 #Simplest
ForDisplay <- bind_rows(DF,AvgProdAssess)
ForDisplay
   Product           A           B         C ASSESSOR
1      100  1.29886516  2.00228881 3.2498225        1
2      100 -0.87434301  1.98536699 3.6595942        2
3      100 -1.81671427  1.46575041 3.5324999        3
4      100 -1.94906422 -0.43460287 4.6065969        1
5      100 -0.04300027 -0.04036871 0.3176521        2
6      100  0.01788844  1.71753523 2.3796322        3
7      100  0.15624731  0.67391914 3.1304014        1
8      100  0.90411901  0.97715911 2.5656645        2
9      100 -0.22848145  0.60571528 2.4308653        3
10     200  1.15827971  2.50506573 3.1801773        1
11     200 -0.80972382  1.45251898 3.3305416        2
12     200  2.08823942  1.20287002 2.2199625        3
13     200  0.16096930  1.97534239 4.0727839        1
14     200 -0.58680213 -0.27539728 3.3402972        2
15     200  1.52207256  1.98980576 2.2192445        3
16     200  1.86133434  1.78157191 2.9793313        1
17     200  0.66100718  2.18293374 3.9143471        2
18     200  0.36605630  2.01277828 2.2508374        3
19     300 -1.07119780  0.61989786 2.4956932        1
20     300 -1.14862944  1.30815278 3.9569820        2
21     300 -0.34148697  2.51800496 3.4917715        3
22     300  0.29060247  2.76348892 4.5279387        1
23     300 -0.39393838  1.30092633 1.8022293        2
24     300  0.45804901 -0.29899863 1.8102712        3
25     300 -2.90045450  1.11790181 2.4105180        1
26     300 -0.05871497  0.60165401 2.3437096        2
27     300 -0.11487569  1.25525064 3.0862559        3
28      NA -0.05161840  1.29505673 2.9372452       NA
 #More elaborate
DF_display <- DF |> 
   mutate(across(.cols = c("Product", "ASSESSOR"),
                 .fns = as.character))
AvgProdAssess$Product <- "Average"
AvgProdAssess$ASSESSOR <- "-"
ForDisplay2 <- bind_rows(DF_display,AvgProdAssess)
ForDisplay2
   Product           A           B         C ASSESSOR
1      100  1.29886516  2.00228881 3.2498225        1
2      100 -0.87434301  1.98536699 3.6595942        2
3      100 -1.81671427  1.46575041 3.5324999        3
4      100 -1.94906422 -0.43460287 4.6065969        1
5      100 -0.04300027 -0.04036871 0.3176521        2
6      100  0.01788844  1.71753523 2.3796322        3
7      100  0.15624731  0.67391914 3.1304014        1
8      100  0.90411901  0.97715911 2.5656645        2
9      100 -0.22848145  0.60571528 2.4308653        3
10     200  1.15827971  2.50506573 3.1801773        1
11     200 -0.80972382  1.45251898 3.3305416        2
12     200  2.08823942  1.20287002 2.2199625        3
13     200  0.16096930  1.97534239 4.0727839        1
14     200 -0.58680213 -0.27539728 3.3402972        2
15     200  1.52207256  1.98980576 2.2192445        3
16     200  1.86133434  1.78157191 2.9793313        1
17     200  0.66100718  2.18293374 3.9143471        2
18     200  0.36605630  2.01277828 2.2508374        3
19     300 -1.07119780  0.61989786 2.4956932        1
20     300 -1.14862944  1.30815278 3.9569820        2
21     300 -0.34148697  2.51800496 3.4917715        3
22     300  0.29060247  2.76348892 4.5279387        1
23     300 -0.39393838  1.30092633 1.8022293        2
24     300  0.45804901 -0.29899863 1.8102712        3
25     300 -2.90045450  1.11790181 2.4105180        1
26     300 -0.05871497  0.60165401 2.3437096        2
27     300 -0.11487569  1.25525064 3.0862559        3
28 Average -0.05161840  1.29505673 2.9372452        -

Thank you FJCC very much,
I do not want to go too far off topic (my apologies), just want to mention that I have added to your code one thing:

AvgProdAssess <- DF |> 
   summarize(across(.cols = A:C,.fns = c(Mean= mean, Summ = sum, SDs =sd)))

as I wanted to have all three functions at once. Now I must glue it to main dataframe
as last rows. If you could guide me in this I would be very grateful.
This is easy to cut and paste it in excel, but more difficult in R.
best,
A.

I don't know of a simple way to do this in R, though there may be a package that facilitates it. I used the pivot_* functions from tidyr to reshape the summary data but I had to use two steps. There may be a more elegant way to get the same result.

AvgProdAssess <- DF |> 
   summarize(across(.cols = A:C,.fns = c(Mean= mean, Summ = sum, SDs =sd)))
AvgProdAssess
      A_Mean    A_Summ    A_SDs   B_Mean   B_Summ   B_SDs   C_Mean   C_Summ
1 -0.0516184 -1.393697 1.158874 1.295057 34.96653 0.88868 2.937245 79.30562
      C_SDs
1 0.9358609
LongForm <- pivot_longer(AvgProdAssess,cols=everything(),
                          names_to = c("Char","Product"),names_sep = "_")
WideForm <- pivot_wider(LongForm,names_from = "Char")
WideForm
# A tibble: 3 x 4
  Product       A      B      C
  <chr>     <dbl>  <dbl>  <dbl>
1 Mean    -0.0516  1.30   2.94 
2 Summ    -1.39   35.0   79.3  
3 SDs      1.16    0.889  0.936
DF_display <- DF |> 
   mutate(across(.cols = c("Product", "ASSESSOR"),
                 .fns = as.character))
ForDisplay <- bind_rows(DF_display,WideForm)
ForDisplay
   Product           A           B          C ASSESSOR
1      100  1.29886516  2.00228881  3.2498225        1
2      100 -0.87434301  1.98536699  3.6595942        2
3      100 -1.81671427  1.46575041  3.5324999        3
4      100 -1.94906422 -0.43460287  4.6065969        1
5      100 -0.04300027 -0.04036871  0.3176521        2
6      100  0.01788844  1.71753523  2.3796322        3
7      100  0.15624731  0.67391914  3.1304014        1
8      100  0.90411901  0.97715911  2.5656645        2
9      100 -0.22848145  0.60571528  2.4308653        3
10     200  1.15827971  2.50506573  3.1801773        1
11     200 -0.80972382  1.45251898  3.3305416        2
12     200  2.08823942  1.20287002  2.2199625        3
13     200  0.16096930  1.97534239  4.0727839        1
14     200 -0.58680213 -0.27539728  3.3402972        2
15     200  1.52207256  1.98980576  2.2192445        3
16     200  1.86133434  1.78157191  2.9793313        1
17     200  0.66100718  2.18293374  3.9143471        2
18     200  0.36605630  2.01277828  2.2508374        3
19     300 -1.07119780  0.61989786  2.4956932        1
20     300 -1.14862944  1.30815278  3.9569820        2
21     300 -0.34148697  2.51800496  3.4917715        3
22     300  0.29060247  2.76348892  4.5279387        1
23     300 -0.39393838  1.30092633  1.8022293        2
24     300  0.45804901 -0.29899863  1.8102712        3
25     300 -2.90045450  1.11790181  2.4105180        1
26     300 -0.05871497  0.60165401  2.3437096        2
27     300 -0.11487569  1.25525064  3.0862559        3
28    Mean -0.05161840  1.29505673  2.9372452     <NA>
29    Summ -1.39369671 34.96653160 79.3056211     <NA>
30     SDs  1.15887402  0.88868002  0.9358609     <NA>

I have a follow up question. If I wanted to further condense the table by averaging the assessors' results for each product, would I use the following code?

AvgProd <- AvgProdAssess %>%
group_by(PRODUCT) %>%
summarize(across(.row = 132:981, .fns = mean))

I've run the code and it seems to do the trick, but just want to make sure I followed the right process?

That looks right except I do not think .row is an argument of the across() function. Are you trying to use only certain rows? It would be better to select those with the filter() function before using summarize(). What is the characteristic of those rows that causes you to choose them?

I'm only trying to average the 12 assessors' scores so that there is only 1 average per product per attribute. So I'm using all rows, not only certain ones.

Thank you very much for your example.
A.

@Petra - If you want to calculate with all of the rows and with the same columns as before, the summarize() function would be

summarize(across(.cols = PEA_OD:SKIN_VIS, .fns = mean))

Except for that, your code looks fine.

Thank you very much for your help! :slight_smile:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.