Some situations require that decision makers choose from hundreds of alternatives, with descriptions available in text format. This is the case with finding the best employee or job, real estates or the ideal spouse.
My task was to create a software framework for supporting these kind of decision making situations, using active learning and machine learning techniques, so that the user does not have to read all of the documents before getting to the last good one. User feedback after all displayed document is used to help the model refine its predictions for the next iteration.
During work on the thesis task, I have reviewed the related literature, gathered example datasets (CVs) and created a prototype solution. I had to find a way to model the decision maker’s feedback and to create the framework for performance evaluation. I also had to make sure that the selected algorithm runs fast enough to provide a smooth user experience.
I have followed a text mining approach and started with a very basic model. I increasingly added data preparation steps (stemming, dimensionality reduction), experimented with different algorithms and introduced active learning techniques tailored to our specific situation.
Based on the results I have come to the conclusion that the improved model does fairly well in presenting a high proportion of the good documents at an early stage of the decision support process (operates with high recall) and also fulfills the speed requirements. However, it still needs improvement and further testing to address the question of how many document views the system actually spares the decision maker (earliest possible identification of every single good document).