creating a new column by summarising categories in another coloumn

Hello team,

I am trying to create a new column based on a subset of data from another column.

As an example with the Iris data, how how would I calculate:

  1. The number of plants with a petal wide >0.2 for each species
  2. The number of plants with a petal wide <0.2 for each species
    Then store this in a data frame.

I have worked out a 'long way round' of doing this but it feels quite clunky (see below)
Any help would be greatly appreciated!!

Many thanks

iris.big_petal <- iris %>% 
  filter(Petal.Length > 0.2) %>% 
  group_by(Species) %>% 
  mutate(density = n())

iris.small_petal <-iris %>% 
  filter(Petal.Length < 0.2) %>% 
  group_by(Species) %>% 
  mutate(density = n())

## and etc for the other species

iris.sum <- bind_rows(iris.small_petal,iris.big_petal)
```{r }

Maybe like this? The trick is that TRUE =1 and FALSE = 0, so sum(Petal.Width<=0.2) counts the rows that meet the condition.

library(dplyr)
iris |> group_by(Species) |> 
  summarize(Small=sum(Petal.Width<=0.2),
            Large=n()-Small)
1 Like

Thank you so much for your code and explanation!!! :grinning:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.