Budding, C. (Céline)
Eindhoven University of TechnologySection: Philosophy & Ethics group
Faculty: Industrial Engineering & Innovation Sciences
The field of explainable artificial intelligence (xAI) is concerned with the ‘black box’ problem in AI: artificial neural networks are notoriously opaque, making it hard to explain their behavior. Various methods have been developed to this end, for example saliency maps to explain why a certain input leads to a certain output, or counterfactual explanations indicating how an input would need to be changed to get another output. However, one aspect of xAI that has not received much attention is the question of manipulation and intervention: is there a way to understand and localize causal processes in neural networks such that we can intervene in these networks, for example in the case of wrong predictions? In this project, we aim to develop a theoretical framework for causal explanations in AI that allow for intervention and control, specifying desiderata and constraints of such methods. Specifically, we will focus on whether the framework of interventionism by Woodward can be applied to current XAI approaches aiming for causal explanations. Furthermore, we will study whether cognitive models might be suitable as proxy models for explainable AI, i.e. by providing more insight about the knowledge encoded in neural networks. In this way, we aim to define guidelines and recommendations for interventions and causal explanations in xAI and we will consider the practical implications of such a framework in various domains. This project is supervised by dr. C.A. Zednik and prof. dr. V.C. Müller.