Explainable Artificial Intelligence

Principal lecturer: Prof Mateja Jamnik
Additional lecturers: Mateo Espinosa Zarlenga, Dr Zohreh Shams
Taken by: MPhil ACS, Part III
Code: L193
Term: Lent
Hours: 16 (6hrs lectures; 6 hrs presentations;4hrs practicals)
Format: In-person lectures
Class limit: max. 20 students
Prerequisites: A solid background in statistics, calculus and linear algebra. We strongly recommend some experience with machine learning and deep neural networks (to the level of the first chapters of Goodfellow et al.’s “Deep Learning”). Students are expected to be comfortable reading and writing Python code for the module’s practical sessions.
timetable

Aims

The recent palpable introduction of Artificial Intelligence (AI) models to everyday consumer-facing products, services, and tools brings forth several new technical challenges and ethical considerations. Amongst these is the fact that most of these models are driven by Deep Neural Networks (DNNs), models that, although extremely expressive and useful, are notoriously complex and opaque. This “black-box” nature of DNNs limits their ability to be successfully deployed in critical scenarios such as those in healthcare and law. Explainable Artificial Intelligence (XAI) is a fast-moving subfield of AI that aims to circumvent this crucial limitation of DNNs by either

(i) constructing human-understandable explanations for their predictions,

or (ii) designing novel neural architectures that are interpretable by construction.

In this module, we will introduce the key ideas behind XAI methods and discuss some of the important application areas of these methods (e.g., healthcare, scientific discovery, debugging, model auditing, etc.). We will approach this by focusing on the nature of what constitutes an explanation, and discussing different ways in which explanations can be constructed or learnt to be generated as a by-product of a model. The main aim of this module is to introduce students to several commonly used approaches in XAI, both theoretically in lectures and through hands-on exercises in practicals, while also bringing recent promising directions within this field to their attention. We hope that, by the end of this module, students will be able to directly contribute to XAI research and will understand how methods discussed in this module may be powerful tools for their own work and research.

Syllabus

Overview and taxonomy of XAI (why is explainability needed, definition of terms, taxonomy of the XAI space, etc.)
Feature importance methods (e.g., perturbation and saliency methods, etc.)
Data attribution methods (e.g., influence functions, Data SHAP, etc.)
Inherently intepretable models (e.g., SENNs, ProtoPNets, etc.)
Interpretable architectures and concept-based explainability (Net2Vec, T-CAV, ACE, CBMs and variants,etc.)
Neurosymbolic methods (DeepProbLog, Neural Reasoners, etc.)
Attention interpretability
Mechanistic intepretability
LLMs and interpretability, applications such as interpretability in healthcare

Proposed Schedule

The 16 hours of lectures across 8 weeks will be divided as follows:

Week 1: 2h lecture
Week 2: 2h lecture
Week 3: 2h practical 1
Week 4: 2h lecture (including a guest lecture)
Week 5: 2h practical 2
Week 6: 2h lecture (including a guest lecture)
Week 7: 2h practical 3
Week 8: 2h lecture (including a guest lecture)

Objectives

By the end of this module, students should be able to:

Recognise and identify key concepts in XAI together with their connection to related subfields in AI such as fairness, accountability, and trustworthy AI.
Understand how to use, design, and deploy model-agnostic perturbation methods such as LIME, Anchors, and RISE. In particular, students should understand the connection between feature importance and cooperative game theory, and its uses in methods such as SHAP.
Identify the uses and limitations of propagation-based feature importance methods such as Saliency, SmoothGrad, GradCAM, and Integrated Gradients. Students should be able to implement each of these methods on their own and connect the theoretical ideas behind them to practical code, exploiting modern frameworks’ auto-differentiation.
Understand what concept learning is and what limitations it overcomes compared to traditional feature-based methods. Specifically, students should understand how probing a DNN’s latent space may be exploited for learning useful concepts for explainability.
Reason about the key components of inherently interpretable architectures and neuro-symbolic methods and understand how interpretable neural networks can be designed from first principles.
Elaborate on what sample-based explanations are and how influence functions and prototypical architectures such as ProtoPNet can be used to construct such explanations.
Explain what counterfactual explanations are, how they are related to causality, and under which conditions they may be useful.

Upon completion of this module, students will have the technical background and tools to use XAI as part of their own research or partake in XAI research itself. Moreover, we hope that by detailing a clear timeline of how this relatively young subfield has developed, students may be able to better understand what are some fundamental open questions in this area and what are some promising directions that are currently actively being explored.

Assessment

(30%) practical exercises: We will run three practical sessions where students will be asked to perform a series of exercises that require them to use concepts we have introduced in lecture up to that point. For each practical session, we will prepare a colab notebook to guide the student through exercises and we will ask the students to submit their solutions through this colab notebook. We expect students to complete about ⅔ of the exercises in the practical class and complete the rest at home as homework. Each practical will be worth 10%. The first practical will just be a tick worth 10% upon successful completion.

(70%) mini-project: At the end of week 1, we will hand out a list of papers for students to select their mini-projects from. Each mini-project will consist of a student selecting a paper from our list and reimplementing and expanding the key idea in the paper. We encourage students to be as creative as they want with how they drive their mini-project once the paper has been selected. For example, they can reimplement the technique in the paper and combine it with methodologies from other works we discussed in lecture, or they can apply their paper’s methodology to a new domain, datasets, or setup, where the technique may offer interesting and potentially novel insights. We will ask all students to submit a report in a workshop format of up to 4,000 words. This report, due roughly a week after Lent term ends, should describe their methodology, experiments, and results. A crucial aspect of this report involves explaining the rationale behind different choices in methodology and experiments, as well as elaborating on the choices made and hypotheses tested throughout their mini-project (potentially showing a deep understanding of the work they are basing their mini-project on). To aid students with selecting their projects and making progress on them, we will hold regular office hours when students can come to discuss their progress and questions with us.

Explainable Artificial Intelligence

Aims

Syllabus

Proposed Schedule

Objectives

Assessment

Recommended reading

Study at Cambridge

About the University

Research at Cambridge