I recently submitted results to the Employee Shift Scheduling public benchmark. My optimizer computed the best known result for all medium-sized and large instances. I currently do not have time to write and publish an article, but a blog could perhaps be a good substitute.
My optimizer is based on branch and price. To keep this post short, I will assume that the reader is familiar with that framework.
The optimization problem consists of finding good rosters for n staff members. On each day, one of several shift types can be assigned to the staff member, or the staff members can have a day off. There are many constraints on the rosters, such as maximum consecutive working days, required days off, maximum weekends etc.
The optimization problem is modelled as an integer program with boolean variables. Each column is a possible roster for a specific staff member. The first n rows are constraints requiring that every staff member is assigned exactly one roster. The next days × shifts rows have one constraint for every shift for every day ensuring that the correct number of staff members work each shift. The penalties for over- and under-assigning are handled by slack variables.
Because the number of possible rosters is exponential, they are generated along the way. After the linear program has been solved, the dual variables are taken into account when generating new rosters in order to generate rosters with the most negative reduced costs.
New rosters are generated by solving a resource-constrained shortest path problem in a DAG exactly. I have seen that Boost has code for this, but I could not figure out how to use it, so I decided to roll my own.
I am using two resource constraints:
- One resource handles the total minimum and maximum working minutes in the planning period.
- The second resource handles the minimum and maximum consecutive working days allowed.
I then solve the problem by having an array of numbers
p[i, m, c]
containing the smallest cost of reaching node i, when working m minutes and ending in c consecutive working days. This array is filled with dynamic programming.
I make heavy use of tricks like most problems have all shift durations divisible by 480. That makes the algorithm run 480 times faster.
Out of all the constraints required for new rosters, two types are not modeled by the graph:
- Maximum number of working weekends.
- Maximum assignments of particular shift types.
I handle these two in a hacky way: if the solution I get has too many working weekends, simply keep the best ones and give the others a really high penalty. Then I resolve the problem. Same for the number of shifts.
- Instance15: 0.12s
- Instance20: 0.64s
- Instance21: 3.21s
- Instance22: 4.1s
- Instance23: 55s
- Instance24: 285s
When I have done fixes (see below), the solver will run even faster than this since the number of nodes is smaller.
Solving the Linear Program
For the largest two problems, regular simplex solvers (at least the open-source ones I tried) were way too slow to be used. I decided to use a first-order method: Chambolle-Pock. I am doing one extra thing: scaling the objective function with ‖b‖/‖c‖ (norm of the right-hand side divided by the norm of the objective function vector) gives better convergence properties (tested on the NETLIB instances).
It is possible that using SCS would be even better. It beats Chambolle-Pock when I tried them on the NETLIB instances. Writing up a comparison between the two would make an interesting post of its own. But I haven’t tried SCS on scheduling problems yet.
Plots showing the first-order solver in action can be seen further down.
An old implementation of mine of a first-order linear programming solver can be found on Github.
Solving the linear program results in a highly fractional solution. Fixing a variable to 1 is a big decision – it means that the entire roster for a single staff member is now fixed. A better decision is to fix a single shift to a staff member, meaning that all columns that do not contain that shift are now disallowed. But if no good shifts are found (meaning that they are already assigned, say, 0.75 to a staff member), I fix a variable to 1 as a fallback.
Also, branching takes too long time, so whenever I make a decision, I stick with that. As can be seen below, the pricing in subsequent rounds can usually fix any errors. These problems probably have a large number of symmetries.
Figure 1. The convergence until the first integer solution. The black discs mark iterations where fixing decisions were made. They sometimes resulted in significantly worse objective function, but subsequent pricing iterations were able to fix that. The cross marks an iteration where a variable was fixed to 1 due to a lack of shift candidates.
Figure 2. The same optimization problem, but using the first-order solver. It is inexact – shown by the objective function sometimes rising slightly between the fixes. More fixes are required since the solution usually is more fractional than when using simplex.
That’s it. I realize this description is too short, but perhaps I will expand it in the future.
The fixing scheme can almost certainly be tuned and improved a lot. The scheme I am using is essentially the first one that worked.
Update (October 12):
By trying fixing at 0.99 and then going to 0.891, 0.8019,… if no candidates are found, I was able to get improved solutions for instance 18 as well.
To obtain several solutions after the first one, I remove all fixes and restart the solver from the beginning. The difference is that now there are immediately many columns to choose from, including ones that were generated with fixes taken into account.
Figure 3. Optimization run computing a solution of 4745 for instance 18. Red discs are integer solutions. The complete run time is reasonable – 230 seconds.