Setting up an optimization problem

ttrodrigz · March 8, 2020, 4:50am

I'm looking to create a mathematical programming model, it's probably simple but it's been a long time since I've worked this kind of problem and I'm stuck. Here's a story problem to tee this up:

Say a teacher has 4 toys for her 10 students. In a given day, she counts whether or not student i played with toy j.

toys <- tibble::tribble(
    ~student, ~toy_a, ~toy_b, ~toy_c, ~toy_d,
          1L,     1L,     1L,     0L,     0L,
          2L,     0L,     0L,     1L,     0L,
          3L,     0L,     1L,     1L,     1L,
          4L,     1L,     1L,     1L,     1L,
          5L,     1L,     1L,     0L,     0L,
          6L,     0L,     0L,     0L,     0L,
          7L,     1L,     0L,     0L,     1L,
          8L,     0L,     1L,     1L,     0L,
          9L,     1L,     0L,     0L,     1L,
         10L,     1L,     1L,     1L,     0L
    )

The teacher would like to pick the optimal set of two (an arbitrary number) toys out of the four, such that the largest percentage of students are satisfied. A given student will be satisfied with the set of two toys if he played with at least one (another arbitrary number) of the toys.

Here's how I've set this up in my head...

There are 4 "decision variables", one for each toy, representing whether or not the toy is included in the new set.
It seems that these decision variables could be represented by a numeric vector of ones and zeros.
The the thing we're trying to optimize is the proportion of students satisfied with the new set, this value will of course be bound between 0 and 1.

toys_m <- as.matrix(toys[-1])

mean_toy_satisfaction <- function(toys.m = toys_m, included, min.play = 1) {
    
    # this "toggles on/off" toys in the set
    toys.m <- toys.m %*% diag(included)
    
    # how many toys did a given student play with in this set?
    num.toys.played <- rowSums(toys.m)
    
    # meet satisfaction criteria?
    satisfied <- num.toys.played >= min.play
    
    mean(satisfied)
    
}

# example where toys A and B make it into the set
mean_toy_satisfaction(
    toys.m = toys_m,
    included = c(1, 1, 0, 0), 
    min.play = 1
)
#> [1] 0.8

So, the variable which needs optimizing is the vector passed to the included argument to the above function. Again, this seems like it should be rather easy to implement (and it is, if you want to use Excel Solver ).

I've looked through a number of the optimization packages, but I'm honestly not sure where to start, if anyone could point me in the right direction or help frame the problem better I'd greatly appreciate it.

dag.tanneberg · March 8, 2020, 6:29am

Asking for clarification: By optimization you mean: You don't provide that profile manually, but the program should find and analyze different inclusion profiles on its own?

If so, check or ?combn. Also, it's FUN argument should allow you to generate the required statistic, i.e. mean toy satisfaction. As this is a grid search approach, you'll quickly run into memory troubles as the number of toys increases.

HTH

Dag

technocrat · March 8, 2020, 7:11am

# addn libraries
library(magrittr)
library(nnet)
library(purrr)
#> 
#> Attaching package: 'purrr'
#> The following object is masked from 'package:magrittr':
#> 
#>     set_names
# from post
toys <- tibble::tribble(
    ~student, ~toy_a, ~toy_b, ~toy_c, ~toy_d,
          1L,     1L,     1L,     0L,     0L,
          2L,     0L,     0L,     1L,     0L,
          3L,     0L,     1L,     1L,     1L,
          4L,     1L,     1L,     1L,     1L,
          5L,     1L,     1L,     0L,     0L,
          6L,     0L,     0L,     0L,     0L,
          7L,     1L,     0L,     0L,     1L,
          8L,     0L,     1L,     1L,     0L,
          9L,     1L,     0L,     0L,     1L,
         10L,     1L,     1L,     1L,     0L
    )

toys_m <- as.matrix(toys[-1])

# made included the first argument
mean_toy_satisfaction <- function(included, toys.m = toys_m, min.play = 1) {
    
    # this "toggles on/off" toys in the set
    toys.m <- toys.m %*% diag(included)
    
    # how many toys did a given student play with in this set?
    num.toys.played <- rowSums(toys.m)
    
    # meet satisfaction criteria?
    satisfied <- num.toys.played >= min.play
    
    mean(satisfied)
    
}

# added

# combinations of four objects taken two at a time
m <- structure(c(1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 
1, 0, 0, 1, 0, 1, 1), .Dim = c(6L, 4L))

# find and display the maximum mean_toy_satisfaction
result <- map(array_tree(m,1), mean_toy_satisfaction) %>% unlist() 
max(result)
#> [1] 0.9
# show the included that generated it
(which.is.max(unlist(result)) %>% m[.,])
#> [1] 1 0 1 0

^{Created on 2020-03-07 by the reprex package (v0.3.0)}

ttrodrigz · March 8, 2020, 3:57pm

Correct, the profile isn't provided manually. I was hoping to solve this problem without having to hammer through all n choose k combinations (2^n if done for each k) - which as you stated becomes quite an issue as that set size increases - and instead find the optimal combination for size k by solving a system of equations.

Thanks for the suggestion on taking advantage of the FUN argument to combn, actually forgot that was an option!

ttrodrigz · March 8, 2020, 4:07pm

Very concise, thanks! I wasn't aware of array_tree.

However, I'm looking to solve this problem by treating it as solving a system of equations rather than testing each n choose k combination. The method you chose works brilliantly for this toy example, but doing this when the item set gets large is going to be a problem. Especially since I intend on finding the optimal set for every k from 1:n.

technocrat · March 8, 2020, 8:07pm

Ah, different question than the one I addressed. So, is this what I should have been attempting: i ... n_1 subjects each chooses from among j ... n_2 objects k ... n_3 times.

That would give an outcome space in 3d, which should be amenable to a relatively efficient linear algebra solution, says the man who doesn't have to do it himself.

lxy009 · March 8, 2020, 9:20pm

check out CRAN Task View on optimization.

Not 100% sure what you are trying to accomplish but I think it can be formulated as an IP.
I've used Rglpk before. This package uses GLPK. Look at the reference manual, particularly the examples of the last function.

If you have access to CPLEX and Gurobi, there are packages for that as well.

ttrodrigz · March 8, 2020, 10:29pm

Correct!

I've looked through some of the packages available that @lxy009 mentioned and it certainly seems doable being formulated as an integer programming problem, I'm just not grasping how to actually set one up. Not mentioned in the list but the IpSolve package looks like another route.

technocrat · March 8, 2020, 11:02pm

I'm not sure either. I'll have to poke around and see if R has anything comparable to a Python generator operator. Would seem that if you could step through the input space one triplet at a time, evaluate it, and store the result as a has with the input as a key, it wouldn't run up a big memory tab.

lxy009 · March 11, 2020, 12:56am

I'm not as good in Operations Research as I wish.
However, are you sure you have an objective function that is linear?
That's where I'm struggling.

If I use the four decision variables, I don't see how to create simple coefficients.
For example, let x = {x_a, x_b, x_c, and x_d} be the binary decision variables with constraint that the sum equals two.
What is the coefficient for x_a?
Think about the first student's contribution to the objective.
If x = {1,0,0,1}, then the contribution to the objective is 1.
If x = {1,1,0,0}, then the contribution to the objective is still 1, not 2.
If x = {0,1,0,1}, the contribution is 1 again but it seems like the contribution from x_b is different here than above.

I think if you modify the objective and say a student will be doubly satisfied if he's played with both toys, then you can create an easy linear objective and then plug into your choice of MILP optimizer.
The answer probably won't be the same though.

If you stick with the original objective, it seems brute force or metaheuristics like genetic algorithms will be your options.

Maybe someone with an OR background might find a way to relax the formulation, like in traveling salesman where you add a bunch of constraints and auxiliary variables to make the candidate solutions feasible and keep a linear objective function. I know of some ways to make nonlinear objective functions linear but not for this scenario. sorry.

lxy009 · March 11, 2020, 1:20am

Actually I kind of see a way to do make the objective linear using auxiliary variables but I'm going to confer with a colleague first.

ttrodrigz · March 11, 2020, 4:01pm

My OR skills are on the lighter side as well (an understatement), will certainly have to rely on someone with actual experience in this.

You raise a good point about the linearity assumption. In fact, when I replicate this problem in Excel and use Solver to find the optimal solution, I need to set the "Solving Method" to Non-Linear as the default Simplex LP method does not work due to linearity conditions not being met.

ADD/EDITION TO POST:
I'm also wondering if I'm formulating the problem correctly. As I'm thinking about this, we might consider instead of maximizing the proportion reached (using the term reached now instead of satisfied - what the MR industry uses), what about instead minimizing the number of items needed in order to obtain some level of reach. Say you wanted to find the optimal number of items where at least 85% of the cases are reached, meaning, they have at least one item they would use in the optimal set.

lxy009 · March 12, 2020, 6:03am

So I did a bit of work and conferred with a colleague. Here's our formulation to allow for a integer (linear) programming solution.
Sorry if its hard to read.
Not sure how to do formulas here.

Let the binary decision variables be y_j where j indexes the toys and y_j = 1 if the toy is selected by the teacher.
Let the binary auxiliary variables be x_i where i indexes the students and x_i =1 if the student is satisfied with the teachers selection of toys.

The objective is to maximize the sum of x_i.

The first constraint is easy.
sum of y_j = K where K is the number of toys selected by the teacher.

The remaining constraints will set x_i to the appropriate value based on the values of y_j
for all i, x_i <= (1/B) * sum over j of b_ij * y_j
b_ij = 1 if the ith student played with jth toy, else 0.
this value comes from the data
B = the min number of toys to make the student satisfied; in this case 1.

If you need more details or walk-through, please message me directly.

technocrat · March 12, 2020, 5:19pm

You can do formulas by enclosing in dollar signs. Subscripts _ superscripts ^

Let the binary decision variables be y_j where j indexes the toys and y_j = 1 if the toy is ...

x_i ..

ttrodrigz · March 13, 2020, 2:53pm

Thanks for this! I'll see what I can come up with to integrate your ideas.

lxy009 · March 13, 2020, 6:27pm

thanks!! didn't realize we could use latex math directly.

system · April 3, 2020, 6:27pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.