Distributing rows from one table across another, with some conditions?

jameshowison · February 8, 2018, 2:57am

I have a data frame of people, one per row. The task I'm struggling with is to assign two reviewers to each person, satisfying these constraints:

balance the number of reviews for each reviewer,
each row in people might have a "needed reviewer", need to assign reviewers such that the pair of reviewers assigned to a task includes the needed reviewer.

library(tidyverse)

people <- tribble(
  ~name, ~needed_reviewer,
  "Person 1", "Reviewer 1",
  "Person 2", "Reviewer 3",
  "Person 3", NA_character_,
  "Person 4", NA_character_,
  "Person 5", NA_character_,
  "Person 6", NA_character_
)

reviewers <- c("Reviewer 1", "Reviewer 2", "Reviewer 3")

reviewer_pairs <- combn(reviewers, 2, simplify = F) %>% 
  # obtain sufficient repeats, rep_len gives a target length
  rep_len(nrow(files))
  # now a list of combinations of reviewer pairs, same length as people.

So now I have two tables and I want to match rows up. My desired output is something like the table below (but reviewer_1 and reviewer_2_ could alternatively be a list column, assigned_reviewers):

  ~name, ~reviewer_1, ~reviewer_2
  "Person 1", "Reviewer 1", "Reviewer 2", # matches needed_reviewer constraint
  "Person 2", "Reviewer 1", "Reviewer 3", # matches needed_reviewer constraint
  "Person 3", "Reviewer 2", "Reviewer 3", # no constraint
  "Person 4", "Reviewer 1", "Reviewer 2", # no constraint
  "Person 5", "Reviewer 1", "Reviewer 3", # no constraint
  "Person 6", "Reviewer 2", "Reviewer 3"  # no constraint

But it's not a join (or at least not one that I can see!). Before I had the "needed reviewer" constraint I was using cbind to "tack" the reviewer pair columns onto the table, but that doesn't work once I need to match rows up.

If I was working in something like Python I might iterate through the reviewer_pairs, popping each one off and finding a place in the files table (changing it in place as some sort of global variable). Seems to need some sort of "find and update shared table"? I tried something along those lines with purrr::map but, at least in the tidyverse, the idea of "updating" a shared table isn't part of the paradigm, if I understand correctly.

I tried having a function that found and returned a matched row, and using purrr:mapto run that for each reviewing_pair, but since each item in the list gets the unalterned files rows in files would be assigned more than once. Maybe I need something that passes unassigned rows to each iteration. Is that possible with the functional programming approach of map?

I'm thinking that I've horribly over-complicated this? Does anyone have suggestions for "distributing" rows from one table across another table, according to conditions? Perhaps I need to move to some sort of constraint programming package? Any suggestions? I find them all pretty unapproachable, but I'd enjoy learning them if there were some approachable tutorials

mishabalyasin · February 8, 2018, 9:38am

I don't have a solution, but to me your problem sounds a lot like linear optimization problem, so I can recommend a package - ompr by @dirk (at least I think that's him ) . There is an example of how to solve sudoku using this package.

jameshowison · February 14, 2018, 8:53pm

Thanks much, @mishabalyasin. That was sufficient to get me going! I was able to adapt one of Dirk's examples from the ompr library, on Course Assignment. The worked code below definite seems like overkill but in my real case that was much larger this worked great.

files <- tribble(
  ~name, ~needed_reviewer,
  "Person 1", "Reviewer 1",
  "Person 2", "Reviewer 3",
  "Person 3", NA_character_,
  "Person 4", NA_character_,
  "Person 5", NA_character_,
  "Person 6", NA_character_
)

reviewers <- tibble(reviewer = c("Reviewer 1", "Reviewer 2", "Reviewer 3")) %>% 
  rownames_to_column() %>% 
  mutate(rowname = as.numeric(rowname))


reviewable <- files %>% 
  rownames_to_column() %>% 
  mutate(rowname = as.numeric(rowname))


n <- nrow(reviewable) # was students, now applicants
m <- nrow(reviewers) # was courses, now reviewers (who are like teachers of their courses

reviewers_per_applicant <- 2

# 5 applicants, 10 reviews, 5 reviewers, at most 2 applicants per reviewer.
max_applicants_per_reviewer <- ( (n * reviewers_per_applicant) / m ) %>%  ceiling()


capacity <- rep.int(max_applicants_per_reviewer, m)

# function to get rowname of reviewer preferred, given rowname of applicant
preferences <- function(applicant) {
  rev_name <- reviewable %>% 
    filter(rowname == applicant) %>% 
    pull(needed_reviewer)
  # print(rev_name)
  reviewers %>% 
    filter(reviewer == rev_name) %>% 
    pull(rowname) %>% 
    as.numeric()
  
} 

# preferences(29) # --> "Tanya" --> 6.

# perhaps the weight should be 1 if desired, 0 if no opinion?
weight <- function(applicant, reviewer) {
  p <- which(as.numeric(reviewer) == preferences(as.numeric(applicant)))
  as.integer(if (length(p) == 0) {
    -100000
  } else {
    p
  })
}

# weight(8,6)
library(ompr)

# simple adaptation of
# https://dirkschumacher.github.io/ompr/articles/problem-course-assignment.html
model <- MIPModel() %>%
  
  # 1 iff applicant i is assigned to reviewer m
  add_variable(x[i, j], i = 1:n, j = 1:m, type = "binary") %>% 
  # maximize weights derived from preferences.
  set_objective(sum_expr(weight(i, j) * x[i, j], i = 1:n, j = 1:m)) %>% 
  # no one exceeds capacity. This actually assures balance across reviewers too.
  add_constraint(sum_expr(x[i, j], i = 1:n) <= capacity[j], j = 1:m) %>% 
  # each applicant must have 2 and only 2 reviewers.
  add_constraint(sum_expr(x[i, j], j = 1:m) == 2, i = 1:n)
# model

library(ompr.roi)
library(ROI.plugin.glpk)
result <- solve_model(model, with_ROI(solver = "glpk", verbose = TRUE))

# Probably a simpler way to do this! Just mapping from the integer assignments back to the names.
matching <- result %>% 
  get_solution(x[i,j]) %>%
  filter(value > .9) %>%  
  dplyr::select(i, j) %>% 
  # dplyr::rename(student_num = i) %>% 
  rowwise() %>% 
  # weight = weight(as.numeric(i), as.numeric(j)),
  mutate(preferences = paste0(preferences(as.numeric(i)), collapse = ",")) %>% 
  rename(applicant = i, reviewer = j) %>% 
  ungroup() %>% 
  left_join(reviewable, by = c("applicant" = "rowname")) %>% 
  select(name, needed_reviewer, reviewer, preferences) %>% 
  left_join(reviewers, by = c("reviewer" = "rowname")) %>% 
  rename(assigned_reviewer = reviewer.y) %>%
  mutate(preferences = as.numeric(preferences)) %>% 
  left_join(reviewers, by = c("preferences" = "rowname")) %>% 
  mutate(preference_reviewer = reviewer.y) %>% 
  select(name, needed_reviewer, preference_reviewer, assigned_reviewer)

assigned <- matching %>% select(name, reviewer = assigned_reviewer)


assigned %>%
  summarize(n = n_distinct(name))

assigned %>%
  group_by(reviewer) %>% 
  summarize(n = n_distinct(name))