Bachelor thesis project supervision 2022 Q2

Undergraduate course, , 2022

Supervision of the thesis project related to optimal decision trees of three bachelor students.

Cost-sensitive optimal decision trees dealing with delays

By: Sven Butzelaar - Thesis

abstract Machine learning can be used to classify patients in a hospital. Here, the classifier has to minimize the cost of misclassifying the patient and minimize the costs of the tests. Unfortunately, obtaining features may be costly, e.g., taking blood tests or doing an x-ray scan. Furthermore, it is possible that acquiring those test results may take a few days. To train such a classifier, several machine learning algorithms exist. Decision trees appear as favourites since, unlike other classifiers, decision trees do not need all feature values to classify an instance. Current approaches, however, only use heuristics to find a local optimum. Although heuristics are relatively fast, they are not optimal and therefore may not capture well the underlying characteristics of the given dataset. We propose an optimal approach to train cost-sensitive decision trees while also considering these delayed tests. Here we show, smaller trees with higher accuracy and a lower cost can be constructed, compared to a heuristic approach. We use dynamic programming to allow us to skip many calculations, which speeds up the programming time to seconds.

Robustness of optimal randomized decision trees with dynamic programming

By: Valentijn Götz - Thesis

abstract Decision tree learning is widely done heuristically, but advances in the field of optimal decision trees have made them a more prominent subject of research. However, current methods for optimal decision trees tend to overlook the metric of robustness. Our research wants to find out whether the robustness of optimal decision trees can be improved by incorporating randomization. To achieve this, we added randomization to the existing MurTree algorithm, and performed experiments to compare the robustness. The results show that adding randomization improves the robustness of the decision tree but lowers the out of sample accuracy.

Optimal decision tree using dynamic programming for the algorithm selection problem

By: Henwei Zeng - Thesis

abstract Several algorithms can often be used to solve a complex problem, such as the SAT problem or the graph coloring problem. Those algorithms differ in terms of speed based on the size or other features of the problem. Some algorithms perform much faster on a small size while others perform noticeably better on a larger instance. The optimization problem in this case is to select the best-performing algorithm based on the problem features, resulting in a much faster overall runtime. This is defined as the algorithm selection problem. Many different approaches have been used to solve this problem, such as constructing optimal decision trees. However, there is little published data on using optimal decision trees for algorithm selection and one study reveals a problem in finding a feasible solution on a large number of problem instances. We provide new insights into solving algorithm selection using a dynamic programming approach. The motivation to use this novel approach is that recent studies suggest it has lower scalability issues compared to the traditional optimal decision tree algorithms, due to several efficient techniques such as caching and frequency counting method. The investigation has shown that compared to the integer programming method, the dynamic programming approach is significantly faster and is able to solve large problem instances.

Jacobus van der Linden

Cost-sensitive optimal decision trees dealing with delays

Robustness of optimal randomized decision trees with dynamic programming

Optimal decision tree using dynamic programming for the algorithm selection problem