WhiteBox explainable models for human and artificial intelligence

© Unsplash

Until a few years ago, intelligent systems such as robots and digital voice assistants had to be tailored towards narrow and specific tasks and contexts. Such systems needed to be programmed and fine tuned by experts. But, recent developments in artificial intelligence have led to a paradigm shift: instead of explicitly representing knowledge about all information processing steps at time of development, machines are endowed with the ability to learn. With the help of machine learning it is possible to leverage large amounts of data samples, which hopefully transfer to new situations via pattern matching. Groundbreaking achievements in performance have been obtained over the last years with deep neural networks, whose functionality is inspired by the structure of the human brain. A large number of artificial neurons interconnected and organized in layers process input data under large computational costs. Although experts understand the inner working of such systems, as they have designed the learning algorithms, often they are not able to explain or predict the system’s intelligent behavior due to its complexity. Such systems end up as blackboxes raising the question of how such systems’ decisions can be understood and trusted.


Our basic hypothesis is that explaining an artificial intelligence system may not be fundamentally different from the task of explaining intelligent goal-directed behavior in humans. Behavior of a biological agent is also based on the information processing of a large number of neurons within brains and acquired experience. But, an explanation based on a complete wiring diagram of the brain and all its interactions with its environment may not provide an understandable explanation. Instead, explanations of intelligent behavior need to reside at a computationally more abstract level: they need to be cognitive explanations. Such explanations are developed in computational cognitive science. Thus, WhiteBox aims at transforming blackbox models into developing whitebox models through cognitive explanations that are interpretable and understandable.


Following our basic assumption, we will systematically develop and compare whitebox and blackbox models for artificial intelligence and human behavior. In order to quantify the differences between these models, we will not only develop novel blackbox and whitebox models, but also generate methods for the quantitative and interpretable comparison between these models. Particularly, we will develop new methodologies to generate explanations automatically by means of AI. As an example, deep blackbox models comprise deep neural networks whereas whitebox models can be probabilistic generative models with explicit and interpretable latent variables. Application of these techniques to intelligent goal directed human behavior will provide better computational explanations of human intelligent behavior as well as allow to transfer human level behavior to machines.

LOEWE Research Cluster


  • Technical University of Darmstadt (TU Darmstadt)

Fields of study

  • Computer Science
  • Cognition Science
  • Machine Learning
  • Psychology
  • Electrical Engineering and Information Technology
  • Biology 
  • Sports Science
  • Law and Economics

Funding period

since 2021

Project Coordinator

  • Professor Dr. Kristian Kersting, Technical University of Darmstadt
  • Professor Constantin A. Rothkopf, Technical University of Darmstadt


  • Darmstadt

More Information