Stuck on a particular (inverse) ranked list logic

I'm trying to learn how to use purrr. I'm trying to not use a for loop.
I have two dataframes. One with IDs, and another one is a reference.
For each ID, I would like to get the highest item in the reference that the ID does not already have. To explain:

df_id = data.frame(id = c('a', 'a', 'b', 'b', 'b', 'c', 'c'),
                   item = c('x', 'y', 'x', 'z', 'z', 'y', 'z'))

df_ref = data.frame(item = c('w', 'x', 'y', 'z'),
                    count = c(0, 2, 3, 1))

# The result I would like
df_top = data.frame(id = c('a', 'b', 'c'),
                    item = c('z', 'y', 'x'))

Essentially, id = a owns y and x already. So the items that id = a can get are z and w, in that order. Because z has higher count, df_top gives z to id = a.

Similarly, id = b owns x and z already. This means that y and w can be given, so y is given.l

Please let me know if this makes sense.

break the algorithm into smaller steps that you can tackle one at a time.
My instinct is

  1. for each id identify candidates (i.e. would could be added)
  2. for each id evaluate the candidates (find the biggest)
  3. combine together

Yeah I was thinking of the same steps in for loop. It would be something like

for (id in df_id$id) {
  
  current_ids_items = df_id |>
    filter.(id == id) |>
    select.(item) |>
    pull.()

  highest_rank_item_for_current_id = df_ref |>
    filter.(!item %in% current_ids_items) |>
    slice_max.(order_by = count) |>
    pull.()

  df_rbind_after = df_id |>
    mutate.(id = id,
            item = highest_rank_item_for_current_id)

  # ...
}

But maybe what I'm actually asking for are some sample codes that converts for loops into purrr functions. I was only able to find one (r - How to convert this nested for loop into a purrr function - Stack Overflow), but it was too complicated for my purposes -- only looking for basic exercises.

I'm particularly confused on how slice_max part can be used in a purrr function.

I think whereas a nested approach may be possible, its undesirable exactly because it adds the additional complexity ... of nesting.
My suggestion of 3 steps, is an unnested approach, which is why I suggested it.

an example ...



# using slice max to get the top 5 records of a 
# data.frame based on the first column in it

#mpg is first column of mtcars
mtcars |> slice_max(mpg, n = 5)
#Sepal.Length is first column of iris
iris |> slice_max(Sepal.Length, n = 5)

# do mtcars more programatically 
(first_var <- names(mtcars) |> head(n=1))
mtcars |> slice_max(!!sym(first_var), n = 5)


# together via purrr::map  
mylist <- list(mtcars,iris)

purrr::map(mylist,
           ~{
             first_var <- names(.x) |> head(n=1)
             .x |> slice_max(!!sym(first_var),n=5)})
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.