# Team Selection (part 2)

In the first part of this blog post I discussed a type of problem called Team Selection. In rigorous terms, Team Selection is the problem of subdividing a set into optimal disjoint subsets. This is an extremely practical problem, because its solution could be applied in many real-life applications. A certain class of these applications can be simplified into grouping problems, as in the Noom Groups example from part one. In this case we relied on the fact that a group’s optimality was singularly dependent on its similarity, and were able to use the k-means clustering algorithm to create optimal groups.

A more interesting class of team selection applications are those that cannot be reduced to a grouping problem. In part one we saw this with the example of choosing roommates in a university residence. In these cases, the utility of the group is dependent upon more confusing factors than just similarity. In the case of roommate selection this could come in the form of personality theory, which asserts that certain different personality types work well together. Mathematically this makes for a much more interesting problem, which I’ve discussed in another blog post. In general, as the logic that defines set utility becomes more domain-specific, the algorithm to find optimal subsets becomes more complex.

The mathematical intricacies of this problem are fun to think about, but I’ve come to think that the algorithm is not the most important part of the solution. Consider the roommate selection example. From a practical perspective, I think the most important part of a solution that solves this problem is the user interaction. Choosing good roommates is the type of problem that is difficult for computers, because it relies on human intuition. Even using the machine learning tricks I’ve discussed in this blog post, the team selection algorithm would require extensive training before being to perform better than a human. And furthermore, why would the user want to give up their control? Choosing teams is the fun part. Dealing with the hassle of complex systems is the pain point.

The goal of roommate selection is to subdivide the students that applied for housing into optimal residences, buildings, floors, and rooms, in a reasonable amount of time. The solution is a *system* that does so. This is an important distinction to make, because the system can incorporate numerous elements other than an algorithm. For example, the system needs a way to collect, ingest, and clean the data that the algorithm uses. It also needs a way to report the solution back to the user, so they can error check or tweak it to their preference. This a creates a set of tradeoffs. Is it better to collect more data, or have a better algorithm? Is it better to create the best solution on the first try, or enable easy tweaking and iterate towards the best solution?

These questions don’t have easy answers, and those answers should be based on customer research. How do universities currently assign roommates? What data is the decision based on? How successful is it? What are the negative effects of choosing bad roommates? What are the positive effects of choosing food ones? How can these be measured? What are the pain points? The answers to these questions should shape the direction of the system. I don’t know the answer to these questions, however, I’d still like to take a guess at what such a system might be like.

As a university residence coordinator I would like to not have to worry about collecting data, cleaning it, or ingesting it into the system. I’m not a sociology major, so I’d also appreciate help in choosing what data to collect, and how to use that data to make more informed decisions. I’m interested in the work that I do, so I’d like the system to educate me in the process, explaining why and how it uses the data that it does. In addition, there’s some specific data that I’ll need to collect (such as residence requests, roommate requests, etc). This data might have specific constraints associated with it (for example, everyone with matching roommate requests gets their requested roommate), and I need an easy way to enter these constraints. Then, when the system has a solution, I want it to explain to me why it chose the selections it did. Even better, I want a way to experiment and make changes to the solution. I understand that each solution will have risk associated with it (because picking roommates is an inaccurate science), so I’d like the system to help me manage this risk. One way it could do this is by pointing out potential problem spots (e.g. “this floor could be loud” or “this roommate pair could fight”). Even better would be if the system explained why these were potential problem spots. Finally, I want the system to use the data it has to get better over time.

This specific solution is rather involved, but it’s interesting to note that it’s also fairly abstract. There are some aspects specific to roommate selection, like the domain-specific logic behind the algorithm, but for the most part the system could be applied as-is to another team selection problem, like selecting soldiers for fighting units or workers for shifts at a restaurant.

It’s also interesting to note that the algorithm plays a relatively minor role in the system. More important aspects were data management, user education, and risk management. Since the solution is so involved, this hints at where to start building a minimum viable version of the solution.

I find the problem of team selection fascinating, and I think the best way to explore it is through a practical example. Roommate selection is just one of many versions of the team selection problem, but I think it’s a good one to start with because (as a university student) it’s nearby and seems like it has lots of room for improvement. I think that it would be a fun and satisfying problem to solve.