I have a very long data frame that looks like this
owner cat dog ........... monkey elephant rhino giraffe
James 1 2 44 22323 NA 13
for each owner(unique row), i am interested in seeing which of the 88 animals they have by creating a string. if they have even one animal in a given column, I want it to count. I want the output to look like this for every row in a new variable called "animal_list":
owner animal_list
James cat, dog, monkey, elephant, giraffe
I also want another column that tells me the number of occurrences ordered from most common column to least:
owner animal_list_number
James elephant (22323), monkey (44), giraffe (13), dog (2), cat (1)
is this possible to do using tidy verse principle. thank you !
require(tidyverse)
df1 <- tibble(owner = 'James',
cat = 1,
dog = 2,
monkey = 44,
elephant = 22323,
rhino = NA_integer_,
giraffe = 13)
## Turn many columns into many rows to get data in a format usable by the following functions
## paste0 concatenates strings
## collapse argument takes all elements in a vector and squishes them into one element
df2 <- df1 %>%
pivot_longer(., cols = -c('owner'), names_to = 'species',
values_to = 'num_animals') %>%
group_by(owner) %>%
filter(!is.na(num_animals)) %>%
arrange(desc(num_animals)) %>%
mutate(animal_list_number = paste0(species, ' (', num_animals, ')', collapse = ', '),
animal_list = paste0(species, collapse = ', ')) %>%
distinct(owner, animal_list, animal_list_number) %>%
ungroup()