Selecting specific rows of dataframe by unique values?

Here's a simplified Minimal Reprex of my problem. I have a data frame of people's name and their favorite color:

 mydat <- data.frame("name"=c("bob","bob","alice","beth","patty","patty","patty"),
                      "fav_color"=c("green","blue","red","orange","green","red","pink"))
   name fav_color
1   bob     green
2   bob      blue
3 alice       red
4  beth    orange
5 patty     green
6 patty       red
7 patty      pink

Some people have more than one favorite color.

  1. How can I return this data frame if I only allow some max number of colors per name? like if I set some variable maxcolors <- 2 how could I return this dataframe where each name only has 2 colors? I'd want:
   name fav_color
1   bob     green
2   bob      blue
3 alice       red
4  beth    orange
5 patty     green
6 patty       red
  1. Is it possible to add an additional layer to question #1 by trying to allow colors that are unique to the total color list where possible?
    Example: If maxcolors <- 2 then Patty would only return 2 colors, but instead of 'green' and 'red -- which are both already present in the fav_colors column -- return green and pink, because at least 'pink' is not already present:
   name fav_color
1   bob     green
2   bob      blue
3 alice       red
4  beth    orange
5 patty     green
6 patty      pink

EDIT : Should I be using slice_head ?

One way to achieve this is grouping by name and filtering row numbers to be less than the desired max number of colors.

# set max colors
max_colors <- 2

# grab the max colors for each name
group1 = mydat %>%
  group_by(name) %>%
  filter(row_number() <= max_colors) %>%
  ungroup() %>%
  arrange(name)

# identify those with unique colors not in group1
unique_colors = mydat %>%
  filter(!fav_color %in% group1$fav_color)

# bind group1 to unique_colors (unique_colors goes first)
# and keep the "max_colors" number of rows for each name
out = bind_rows(unique_colors, group1) %>%
  group_by(name) %>%
  filter(row_number() <= max_colors) %>%
  ungroup() %>%
  arrange(name)

out
#> # A tibble: 6 × 2
#>   name  fav_color
#>   <chr> <chr>    
#> 1 alice red      
#> 2 beth  orange   
#> 3 bob   green    
#> 4 bob   blue     
#> 5 patty pink     
#> 6 patty green

Created on 2022-09-20 with reprex v2.0.2.9000

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.