Calculate mean values but for data that has more the 10 measurements.

Hello guys and gals. I am a student and new to R. I have this data set. I have to tidy it up with tidy verse and then calculate the mean height for each species that has at least 10 measurements.

My code so far looks like that

biomass2015 <- read_csv(file = "biomass2015_H.csv")
biomass2015_long <- biomass2015 |>
pivot_longer(cols = c("H1","H2","H3","H4","H5","H6","H7","H8","H9","H10"), names_to = "Quadrant", values_to = "Value")

biomass2015_long_noMV <- biomass2015_long |>

biomass2015_long_noMV |>
group_by(species) |>
list("Mean Height" = mean))

It's for sure messy! My question is how to calculate the mean height for every species with at least 10 measurements. Any tips are more than welcome! :slight_smile:

Generate a column to count the species, then filter on that. You could use dplyr::add_tally().

Hey William thank you for your reply! Can you elaborate a little on your reply?
I know how many species I got but some of them have less that 10 height measurements..

Filter those out, then calculate the means. So before the group_by(species).

how I do I generate a new column with the sum of each species?

Sorry, dplyr::add_count(species) before the group by.

You could do it this way:

biomass2015_long_noMV %>%
  add_count(species) %>% 
  filter(n >= 10) %>% 
  group_by(species) %>%
  summarise(mean_height = mean(Value))

# showing counts
biomass2015_long_noMV %>%
  group_by(species) %>%
  summarise(mean_height = mean(Value), count = n()) %>% 
  filter(count >= 10)

Thank you man ! I figured it out exactly when to posted.
My code looks like that:

biomass2015 |> 
  group_by(species) |>
  add_count(species) |>
  filter(n >= 10) |>
               list("Mean Height" = mean))
1 Like

Ohh I did not realised that your posted the code. If you have time can you elaborate on the differences between the two codes? Yours look a lot cleaner :smiley:

Yours is pretty much the same as my first one, except that I have used summarise() instead of summarise_at(), but this is because there is only one variable. Also check out dplyr::across() if using more than one.

The second bit of code, just moves the count in to the summarise, just to show it. It might be a tiny bit slower if you had a really large dataset.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.