Trustworthy complexity reduction by Random Forests

The Random Forest is a very popular tool for regression and variable selection in complex models and a hot topic in various mathematical (and other) fields. While the number of applied publications in this direction increases with an incredible speed, theoretical contributions on "Why does the Random Forest perform so well?" are sparse and mainly focus on the regression problem, i.e. by proving the consistency of the original Random Forest as in Scornet (2015).

Until now all proofs for consistency rely on gaussianity and are rather technical. In our current research, we overcome both issues, while also covering the consistency of other variations like the extratree of Geurts (2006) with a newly developed and simple proof technique. In the upcoming steps of this project, we aim to shed more light on the theory for variable selection, i.e. which features have a relevant impact on the outcome, with a particular focus on the permutation importance measure.

  • Dec 05th 2024, Torsten Reuter succesfully defended his PhD thesis on "D-optimal Subsampling Design for Massive Data"
  • Dec 03rd 2024, Xiangying Chen succesfully defended his PhD thesis on "Conditional Erlangen Program"

...more
  • Dec 05th 2024, Torsten Reuter succesfully defended his PhD thesis on "D-optimal Subsampling Design for Massive Data"
  • Dec 03rd 2024, Xiangying Chen succesfully defended his PhD thesis on "Conditional Erlangen Program"

...more