# Counting number of times a row value in one column matches a row value in another column in a data frame

Lets say that I have a data.frame with fruits and prices.

``````df <- data.frame(
fruit = c("bananas", "melons", "bananas", "melons", "apples", "bananas",
"apples", "melons", "lemons", "melons", "bananas", "apples",
"oranges", "apples", "apples", "lemons", "melons", "apples"),
prize = c(1,1,3,2,3,2,1,3,1,2,2,1,3,2,2,1,2,1)
)
``````

For every unique value in the categorical "fruits" column, I want to count the number of times it is equal to each unique value of my other column "prize".

I have tried to do i as follows:

``````smry_bananas <- df %>%
filter(fruit == 'bananas') %>%
group_by(prize) %>%
summarise(n_1 = sum(prize == '1'),
n_2 = sum(prize == '2'),
n_3 = sum(prize == '3')) %>%
mutate(freq_1 = n_1/sum(n_1 + n_2 + n_3)) %>%
mutate(freq_2 = n_2/sum(n_1 + n_2 + n_3)) %>%
mutate(freq_3 = n_3/sum(n_1 + n_2 + n_3))
``````

This works, but I would like to iterate over the unique values of the "fruits" column and not have to write all these lines for each unique value of the column.

I suppose I should use a for loop ir lapply, but I'm not that experienced with these. Thanks in advance, it is much appreciated

Welcome to the community @AndersdoesR! There may be a more elegant solution, but below is one way to achieve the desired output using the `count()` and `pivot_wider()/pivot_longer()` functions.

``````library(tidyverse)

out = df |>
group_by(fruit) |>
mutate(total = n()) |>
ungroup() |>
count(fruit, total, prize) |>
mutate(freq = n/total) |>
select(fruit, prize, n, freq) |>
pivot_longer(cols = c(-'fruit', -'prize', -'freq')) |>
mutate(prize_label = prize) |>
pivot_wider(names_from = c(name, prize_label), values_from = value) |>
mutate(prize_label = prize,
name = 'freq') |>
pivot_wider(names_from = c(name, prize_label), values_from = freq) |>
mutate_if(is.numeric, ~replace_na(., 0))

# bananas only
out |> filter(fruit == 'bananas')
#> # A tibble: 3 × 8
#>   fruit   prize   n_1   n_2   n_3 freq_1 freq_2 freq_3
#>   <chr>   <dbl> <int> <int> <int>  <dbl>  <dbl>  <dbl>
#> 1 bananas     1     1     0     0   0.25    0     0
#> 2 bananas     2     0     2     0   0       0.5   0
#> 3 bananas     3     0     0     1   0       0     0.25

# all fruit
out
#> # A tibble: 11 × 8
#>    fruit   prize   n_1   n_2   n_3 freq_1 freq_2 freq_3
#>    <chr>   <dbl> <int> <int> <int>  <dbl>  <dbl>  <dbl>
#>  1 apples      1     3     0     0   0.5   0      0
#>  2 apples      2     0     2     0   0     0.333  0
#>  3 apples      3     0     0     1   0     0      0.167
#>  4 bananas     1     1     0     0   0.25  0      0
#>  5 bananas     2     0     2     0   0     0.5    0
#>  6 bananas     3     0     0     1   0     0      0.25
#>  7 lemons      1     2     0     0   1     0      0
#>  8 melons      1     1     0     0   0.2   0      0
#>  9 melons      2     0     3     0   0     0.6    0
#> 10 melons      3     0     0     1   0     0      0.2
#> 11 oranges     3     0     0     1   0     0      1
``````

Created on 2023-07-27 with reprex v2.0.2

What about using `data.table`? First transform the `data.frame` to a `data.table` with `data.table::as.data.table()` or `collapse::qDT()` and then do a oneliner:

``````res <- dt[, .(count = .N), .(fruit, prize)][, freq := count / sum(count), .(fruit)][order(fruit, prize)]
res
fruit prize count      freq
1:  apples     1     3 0.5000000
2:  apples     2     2 0.3333333
3:  apples     3     1 0.1666667
4: bananas     1     1 0.2500000
5: bananas     2     2 0.5000000
6: bananas     3     1 0.2500000
7:  lemons     1     2 1.0000000
8:  melons     1     1 0.2000000
9:  melons     2     3 0.6000000
10:  melons     3     1 0.2000000
11: oranges     3     1 1.0000000
``````

If it is desired to have a column for each frequency and count as well, use `data.table::dcast.data.table()`:

``````> dcast(res, fruit ~ prize, value.var = c("count", "freq"), fill = 0)
fruit count_1 count_2 count_3 freq_1    freq_2    freq_3
1:  apples       3       2       1   0.50 0.3333333 0.1666667
2: bananas       1       2       1   0.25 0.5000000 0.2500000
3:  lemons       2       0       0   1.00 0.0000000 0.0000000
4:  melons       1       3       1   0.20 0.6000000 0.2000000
5: oranges       0       0       1   0.00 0.0000000 1.0000000
``````

Explanation
1.) `dt[, .(count = .N), .(fruit, prize)]` Gets the count by group with `data.table` syntax (Groups are combinations of `fruit` and `prize`, indicated by `.(fruit, prize)` which is the `data.table` syntax for `list()`
2.) `[, freq := count / sum(count), .(fruit)]` adds a column `freq` by reference to the result from 1.) also by group, but this time only by fruits to obtain the correct result of `sum(count)` from the `data.table`
3.) `[order(fruit, prize)]` just rearranges the output for a better looking result
4.) `dcast(res, fruit ~ prize, value.var = c("count", "freq"), fill = 0)`uses the formula `fruit ~ prize` which maps `fruit` to the "y-axis" and `prize` to the "x-axis". The `value.var`now spans all combinaitons of `prize` with `count` and `freq` respectively and fills the entries with the values from `count`/`freq`

Thank you very much scottyd22! I really appreciate the help!

Thanks FactOREO this is a great solution also! Didn't know the data.table but thats really great. Thanks a lot

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.