Trouble with median function with a numeric vector

Hello everyone. I'm having trouble to calculate the median in the following database:

''' r
structure(list(Grupo = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("C-", "C+"), class = "factor"), Id_Animal = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L,
31L, 31L, 32L, 32L, 32L, 32L, 32L, 32L, 32L, 32L, 32L, 32L, 33L,
33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 34L, 34L, 34L, 34L,
34L, 34L, 34L, 34L, 34L, 34L, 35L, 35L, 35L, 35L, 35L, 35L, 35L,
35L, 35L, 35L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L,
37L, 37L, 37L, 37L, 37L, 37L, 37L, 37L, 37L, 37L, 38L, 38L, 38L,
38L, 38L, 38L, 38L, 38L, 38L, 38L, 39L, 39L, 39L, 39L, 39L, 39L,
39L, 39L, 39L, 39L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L,
40L), DPI = c("-5dpi", "3dpi", "6dpi", "7dpi", "8dpi", "9dpi",
"10dpi", "11dpi", "12dpi", "13dpi", "-5dpi", "3dpi", "6dpi",
"7dpi", "8dpi", "9dpi", "10dpi", "11dpi", "12dpi", "13dpi", "-5dpi",
"3dpi", "6dpi", "7dpi", "8dpi", "9dpi", "10dpi", "11dpi", "12dpi",
"13dpi", "-5dpi", "3dpi", "6dpi", "7dpi", "8dpi", "9dpi", "10dpi",
"11dpi", "12dpi", "13dpi", "-5dpi", "3dpi", "6dpi", "7dpi", "8dpi",
"9dpi", "10dpi", "11dpi", "12dpi", "13dpi", "-5dpi", "3dpi",
"6dpi", "7dpi", "8dpi", "9dpi", "10dpi", "11dpi", "12dpi", "13dpi",
"-5dpi", "3dpi", "6dpi", "7dpi", "8dpi", "9dpi", "10dpi", "11dpi",
"12dpi", "13dpi", "-5dpi", "3dpi", "6dpi", "7dpi", "8dpi", "9dpi",
"10dpi", "11dpi", "12dpi", "13dpi", "-5dpi", "3dpi", "6dpi",
"7dpi", "8dpi", "9dpi", "10dpi", "11dpi", "12dpi", "13dpi", "-5dpi",
"3dpi", "6dpi", "7dpi", "8dpi", "9dpi", "10dpi", "11dpi", "12dpi",
"13dpi", "-5dpi", "3dpi", "6dpi", "7dpi", "8dpi", "9dpi", "10dpi",
"11dpi", "12dpi", "13dpi", "-5dpi", "3dpi", "6dpi", "7dpi", "8dpi",
"9dpi", "10dpi", "11dpi", "12dpi", "13dpi", "-5dpi", "3dpi",
"6dpi", "7dpi", "8dpi", "9dpi", "10dpi", "11dpi", "12dpi", "13dpi",
"-5dpi", "3dpi", "6dpi", "7dpi", "8dpi", "9dpi", "10dpi", "11dpi",
"12dpi", "13dpi", "-5dpi", "3dpi", "6dpi", "7dpi", "8dpi", "9dpi",
"10dpi", "11dpi", "12dpi", "13dpi", "-5dpi", "3dpi", "6dpi",
"7dpi", "8dpi", "9dpi", "10dpi", "11dpi", "12dpi", "13dpi", "-5dpi",
"3dpi", "6dpi", "7dpi", "8dpi", "9dpi", "10dpi", "11dpi", "12dpi",
"13dpi", "-5dpi", "3dpi", "6dpi", "7dpi", "8dpi", "9dpi", "10dpi",
"11dpi", "12dpi", "13dpi", "-5dpi", "3dpi", "6dpi", "7dpi", "8dpi",
"9dpi", "10dpi", "11dpi", "12dpi", "13dpi", "-5dpi", "3dpi",
"6dpi", "7dpi", "8dpi", "9dpi", "10dpi", "11dpi", "12dpi", "13dpi"
), Score = c(3, 0, 0, 0, 1, 1, 1, 1, 0, 1, 3, 2, 2, 2, 2, 2,
0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 1, 0, 0, 0,
0, 2, 0, 1, 0, 3, 0, 2, 2, 0, 2, 0, 3, 0, 0, 0, 3, 0, 0, 0, 0,
1, 1, 3, 1, 3, 2, 3, 3, 1, 1, 0, 2, 3, 0, 3, 3, 3, 3, 3, 0, 0,
0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 2, 1, 1, 3, 0, 0, 0,
0, 0, 1, 2, 2, 2, 0, 0, 3, 3.5, 0, 1, 1, 2, 2, 2, 2, 1, 3, 3,
1, 0, 1, 1, 2, 2, 2, 1, 3, 3.5, 0, 0, 0, 1, 0, 0, 0, 2, 1, 1,
0, 0, 1, 1, 2, 2, 2, 3.5, 3.5, 3.5, 0, 1, 1, 1, 2, 2, 0, 1, 1,
1, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, 0, 0, 1, 2, 2, 2, 2,
3, 1, 1, 0, 0, 1, 2, 2, 2, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2,
0, 0)), class = "data.frame", row.names = c(NA, -200L))
'''

When I use the following code:

''' r

my_data %>%
group_by(Grupo) %>%
median(Score,
na.rm = TRUE)

'''

I get the following error: "Error in median.default(., Score, na.rm = TRUE) : need numeric data"

Can anybody help me? Thanks in advance

Perhaps you want to do this.

my_data %>%
  group_by(Grupo) %>%
  summarize(Med_Score =  median(Score, na.rm = TRUE))

Thanks, but I still don't understand why

The group_by() function is part of the dplyr package and it works with other function from dplyr. group_by() changes the attributes of a data.frame (or tibble) so that functions that are designed to work with group_by() can look at those attributes and perform calculations with the desired grouping. The median() function is from base R and it does not use the attributes created by group_by(); it simply takes the median of the numeric vector passed to it.
Also, the %>% operator takes the object on its left and passes it as the first argument of the function on its right. The code you wrote is equivalent to

my_data <- group_by(my_data, Grupo)
median(my_data, Score, na.rm = TRUE)

You are asking median() to calculate the median of an entire data frame and it does not know how to do that. It would also not know that Score is a column within my_data. The functions in dplyr use special methods to allow you to refer to column names without quotes or references like my_data$Score. You could have avoided the error, but not gotten the answer you wanted, if you had written

my_data <- group_by(my_data, Grupo)
median(my_data$Score, na.rm = TRUE)

You would have gotten the median of the entire Score column of my_data. Since median() does not know how to work with group_by(), it would not have calculated a median for each value of Grupo.
My code is equivalent to

my_data <- group_by(my_data, Grupo)
summarize(my_data, Med_Score =  median(Score, na.rm = TRUE))

The summarize() function is from dplyr and knows how to use the results of group_by() so it will calculate a result for each value of Grupo. Giving it my_data as the first argument tells summarize() that any bare names, like Score should be searched for inside of my_data.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.