Bachelor thesis project supervision 2023 Q4
Undergraduate course, , 2023
Supervision of the thesis project related to optimal decision trees of five bachelor students.
Optimal Regression Trees via Dynamic Programming: Optimization techniques for learning Regression Trees
By: Mim van den Bos - Thesis - Resulting publication at ICML-24
abstract Decision trees make decisions in a way interpretable to humans, this is important when machines are increasingly used to aid in making high-stakes and socially sensitive decisions. While heuristics have been used for a long time to find decision trees with reasonable accuracy, recent approaches find fully optimal trees. Due to the computational hardness of finding fully optimal decision trees, it is only practically possible to find shallow trees for a limited dataset size. However, continuous algorithmic improvements keep pushing the scale of feasible solutions. Dynamic Programming approaches promise to find scalable optimal decision trees but need to be adapted for different objectives, such as regression. We combine and adapt the algorithmic techniques of two Dynamic Programming methods, creating a new method that improves the scalability of optimal regression trees. This new method often achieves an order of magnitude speed improvement over a previous state-of-the-art method.
Optimal Robust Decision Trees: A dynamic programming approach
By: Benedict Bien - Thesis
abstract Decision trees are integral to machine learning, with their robustness being a critical measure of effectiveness against adversarial data manipulations. Despite advancements in algorithms, current solutions are either optimal but lack scalability or scale well, but do not guarrantee optimality. This paper presents a novel adaptation of the Murtree algorithm to address these challenges in the pursuit of optimal robust decision trees. We introduce a new method for modeling an adversary as a network flow problem, and provide a dynamic programming approach to solve optimal robust decision trees beyond a depth of two. The performance of our proposed algorithm is compared with bruteforce solutions across varying decision tree depths, feature numbers, and data sizes. This research contributes a significant advancement towards obtaining efficient and effective solutions for optimal robust decision trees, potentially setting a new performance benchmark in this area.
SurTree: constructing optimal survival trees with MurTree
By: Tim Huisman - Thesis - Resulting publication at AAAI-24
abstract Survival analysis revolves around studying and predicting the time it takes for a particular event to occur. In clinical trials on terminal illnesses, this is usually the time from the diagnosis of a patient until their death. Estimating the odds of survival of a new patient can be done by analyzing survival data from past patients in similar conditions. To cluster similar patients based on a set of features, survival trees may be employed, which act as decision trees that assign a survival distribution to each cluster. Many algorithms exist for creating useful survival trees, but not for creating optimal survival trees. In this paper, research on finding optimal classification trees is applied to survival analysis, by adapting the MurTree algorithm to construct survival trees. We present SurTree, an algorithm that applies many of MurTree’s techniques to create globally optimal survival trees. Furthermore, we compare the output quality and runtime performance of SurTree to a state-of-the-art method for constructing survival trees, showing its optimality and its fast computation times on smaller datasets.
Individually fair optimal decision trees Using a dynamic programming approach
By: Chrysanthos Kindynis - Thesis
abstract In this paper, we tackle the problem of creating decision trees that are both optimal and individually fair. While decision trees are popular due to their interpretability, achieving optimality can be difficult. Existing approaches either lack scalability or fail to consider individual fairness. To address this, we define individual fairness as a separable optimization task by analyzing the fairness gained and lost within a sub-tree. Using the Streed framework, we implement an algorithm that constructs optimal decision trees with the lowest misclassification score and individual fairness value above a certain threshold. Our algorithm has been tested on various datasets, demonstrating its effectiveness and scalability. This research is a significant step towards creating fair decision trees that are optimal, fair, and scalable.
Optimal decision trees for the Algorithm Selection Problem: A dynamic programming approach
By: Giulio Segalini - Thesis
abstract The Algorithm Selection Problem is a relevant question in computer science that would enable us to predict which algorithm would perform better on a given instance of a problem. Different solutions have been proposed, either using Mixed Integer Programming or machine learning models, but both suffer from either poor scalability, no guarantees of optimality, or not interpretable models that could be used to gain insights into the nature of the problem. In this work we propose a dynamic programming method to build Optimal Decision Trees to solve the Algorithm Selection Problem, giving us an interpretable model that is globally optimal over the training dataset. We also show that this method is orders of magnitude faster in training trees that are identical to the current state-of-the-art and propose possible improvements for future work.