Using group_by and summarise give NA's. Help

I am working with a dataframe of enzyme activities by different components. The table lists the components in the description column. There are multiple entries for for each different component. I want the mean activity of each of the different components. I've been unable to discern why I cannot gent an answer other than NA for the table produced using group_by and summarise.

Here is the Table:

    structure(list(lot = c("CLS-X X030118", "CLS-X X030118", "CLS-X X030118", 
    "CLS-X X030118", "CLS-X X030118", "CLS-X X030118", "CLS-X X030118", 
    "CLS-X X030118", "CLS-X X030118", "CLS-X X030118"), COLLAGENASE = c(736, 
    1029, 958, 1017, 468, 468, 579, 597, 759, 668), Description = c("Pharmatone 25b", 
    "Pharmatone 25b", "Pharmatone 25b", "Pharmatone 25b", "Primatone HS", 
    "Primatone HS", "Primatone HS", "Primatone HS", "Primatone RL", 
    "Primatone RL")), row.names = c(NA, -10L), class = c("tbl_df", 
    "tbl", "data.frame"))
# A tibble: 10 x 3
   lot           COLLAGENASE Description   
   <chr>               <dbl> <chr>         
 1 CLS-X X030118         736 Pharmatone 25b
 2 CLS-X X030118        1029 Pharmatone 25b
 3 CLS-X X030118         958 Pharmatone 25b
 4 CLS-X X030118        1017 Pharmatone 25b
 5 CLS-X X030118         468 Primatone HS  
 6 CLS-X X030118         468 Primatone HS  
 7 CLS-X X030118         579 Primatone HS  
 8 CLS-X X030118         597 Primatone HS  
 9 CLS-X X030118         759 Primatone RL  
10 CLS-X X030118         668 Primatone RL 

The df is called peptest. Here is the code:

testgrpby<- peptest %>% 
  group_by(Description) %>% summarise(
  mean("COLLAGENASE" , na.rm=TRUE ))

I have tried to make the COLLAGENASE column numeric. I still get NA's


Here is the problem on Stackoverflow.

I have been stuck on this for days.

Your problem is that you have to provide a name for your result. This will fix it:

Data <- structure(list(lot = c("CLS-X X030118", "CLS-X X030118", "CLS-X X030118", 
                       "CLS-X X030118", "CLS-X X030118", "CLS-X X030118", "CLS-X X030118", 
                       "CLS-X X030118", "CLS-X X030118", "CLS-X X030118"), COLLAGENASE = c(736, 
                                                                                           1029, 958, 1017, 468, 468, 579, 597, 759, 668), Description = c("Pharmatone 25b", 
                                                                                                                                                           "Pharmatone 25b", "Pharmatone 25b", "Pharmatone 25b", "Primatone HS", 
                                                                                                                                                           "Primatone HS", "Primatone HS", "Primatone HS", "Primatone RL", 
                                                                                                                                                           "Primatone RL")), row.names = c(NA, -10L), class = c("tbl_df", 
                                                                                                                                                                                                                "tbl", "data.frame"))
library('dplyr')
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
Data %>% 
  group_by(Description) %>%
  summarise(
    means = mean(COLLAGENASE , na.rm=TRUE)
    )
#> # A tibble: 3 × 2
#>   Description    means
#>   <chr>          <dbl>
#> 1 Pharmatone 25b  935 
#> 2 Primatone HS    528 
#> 3 Primatone RL    714.

Created on 2022-10-31 with reprex v2.0.2

Kind regards

Please familiarize yourself with our cross-posting policy, in short, it is not considered OK to just drop a link to other help site.

That did not work. It returned:
Description Means

1 Pharmatone 25b NA
2 Primatone HS NA
3 Primatone RL NA
The error message I get is:
In mean.default("COLLEGENASE", na.rm = TRUE) :
argument is not numeric or logical: returning NA

I have tried everything, and reached out to multiple places and I cannot figure out why I get that error. Thank you for the post 'tho.

Well, I copied your data from SO and the reprex I provided works. In your attenpt you simply did not name the result, but you have to assign a name to the function in the summarise call. With your code on SO the summarise function tries to name the result according to your sum() call, but since the sum() call gets the name, there is no function to use and R returns NA for all values.

Just assign a name and don't use " around the name of the variable which has to be used inside your sum() call and everthing is fine (again, see my reprex above).

If there still is the same issue, copy the code you ran into the forum and provide a bit of your data. Otherwise we could just guess.

Do not put COLLEGENASE in quotes. @FactOREO did not have quotes in the code he posted. Notice the difference between the two version of the code below.

#Version 1 with quotes
Data %>% 
   group_by(Description) %>%
   summarise(
     means = mean("COLLAGENASE" , na.rm=TRUE)
   )
# A tibble: 3 × 2
  Description    means
  <chr>          <dbl>
1 Pharmatone 25b    NA
2 Primatone HS      NA
3 Primatone RL      NA
Warning messages:
1: In mean.default("COLLAGENASE", na.rm = TRUE) :
  argument is not numeric or logical: returning NA
2: In mean.default("COLLAGENASE", na.rm = TRUE) :
  argument is not numeric or logical: returning NA
3: In mean.default("COLLAGENASE", na.rm = TRUE) :
  argument is not numeric or logical: returning NA

#Version 2 with no quotes
Data %>% 
   group_by(Description) %>%
   summarise(
     means = mean(COLLAGENASE , na.rm=TRUE)
   )
# A tibble: 3 × 2
  Description    means
  <chr>          <dbl>
1 Pharmatone 25b  935 
2 Primatone HS    528 
3 Primatone RL    714

Thank you for your help and patience. I understand my errors, and I will work on my R skills.
Again, thank you

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.