Counting in multiple vectors and categories

Hi! Hoping someone can help me with this - it seems like there should be a fairly simple solution, but internet searches have led me to things that are almost but not quite what I need. I'm an R novice and not sure how to put it together myself.

I would like to see which countries have the largest variety of mosquito species. The data is organized by country, and each species is a vector with "y" if the species appears there and no data point if it does not. (sample below - actual data contains 48 countries and 26 species)

Country An funestus An coustani An ziemanni An gambiae
Angola n/a n/a n/a y
Benin n/a y n/a y
Botswana n/a n/a y n/a
Burundi y n/a y y

In this case, the number of species present in Angola would be 1, Benin 2, Botswana 1, and Burundi has 3. How do I get R to count all the "y"s across all species columns while maintaining their association with their country category?

Here is one way. Try executing just the pivot_longer() step first to see what that does and then add on the group_by() and the summarize(). I changed the mosquito names to just A, B and C to save typing.

DF <- data.frame(Country = c("Anglola", "Benin", "Botswana"),
                 A = c(NA, NA, "Y"),
                 B = c("Y", "Y", NA),
                 C = c("Y", NA, NA))
DF
#>    Country    A    B    C
#> 1  Anglola <NA>    Y    Y
#> 2    Benin <NA>    Y <NA>
#> 3 Botswana    Y <NA> <NA>
library(tidyr)
library(dplyr)
Counts <- DF %>% pivot_longer(A:C, names_to = "Species", values_to = "Value") %>% 
  group_by(Country) %>% 
  summarize(Cnt = sum(Value == "Y", na.rm = TRUE))
Counts
#> # A tibble: 3 x 2
#>   Country    Cnt
#>   <fct>    <int>
#> 1 Anglola      2
#> 2 Benin        1
#> 3 Botswana     1

Created on 2020-02-17 by the reprex package (v0.3.0)

3 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.