Explainable Artificial Intelligence
Slides
- Week 1
- Lecture 1: Introduction, Global Perturbation Methods
- Lecture 2: Feature Importance Methods
- Week 2
- Lecture 3: Saliency and Post-hoc Concept-based Methods
- Student presentations (31 January):
- Week 3
- Lecture 4: In-model Concept-based ML
- Student presentations (7 February):
- Week 4
- Week 5
- Lecture 5: Influence Functions and Mechanistic Interpretability
- Student presentations (21 February):
- Week 6
- Week 7
- Lecture 6, Guest lecture by Pietro Barbiero (IBM Reserach, Zurich): Neural interpretable reasoning
- Student presentations (7 March):
- Week 8
- Course Summary and Conclusion: Future Directions
- Student presentations (14 March):
Student Presentations
Student presentations of papers should be at most 15 minutes long followed by Q&A.
Here is the form to submit your questions for every student presentations session.
Mini Project Papers
Here is the form to submit your preferred and ranked top 5 papers for your mini project.
- [Empirical, Concepts] Fong, Ruth, and Andrea Vedaldi. "Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
- [Perturbation, Post-hoc] Singla, Sumedha, et al. "Explanation by Progressive Exaggeration." International Conference on Learning Representations.
- [Saliency, Post-hoc, Adversarial] Dombrowski, Ann-Kathrin, et al. "Explanations can be manipulated and geometry is to blame." Advances in neural information processing systems 32 (2019).
- [Saliency Application, Generalisation] Kim, Jang-Hyun, Wonho Choo, and Hyun Oh Song. "Puzzle mix: Exploiting saliency and local statistics for optimal mixup." International conference on machine learning. PMLR, 2020.
- [Saliency Application, Fairness] Asgari, Saeid, et al. "Masktune: Mitigating spurious correlations by forcing to explore." Advances in Neural Information Processing Systems 35 (2022): 23284-23296.
- [Concepts, Post-hoc] Crabbé, Jonathan, and Mihaela van der Schaar. "Concept activation regions: A generalized framework for concept-based explanations." Advances in Neural Information Processing Systems 35 (2022): 2590-2607.
- [IntArch, Trees] Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez: "Beyond Sparsity: Tree Regularization of Deep Models for Interpretability." AAAI2018: 1670-1678.
- [IntArch, RepLearn] Alvarez Melis, David, and Tommi Jaakkola. "Towards robust interpretability with self-explaining neural networks." Advances in neural information processing systems 31 (2018).
- [InterArch, Concepts, Normalisation] Chen et al. "Concept Whitening for Interpretable Image Recognition". Nature Machine Intelligence (2020).
- [IntArch, Concepts, Unsup, LLMs] Yang, Yue, et al. "Language in a bottle: Language model guided concept bottlenecks for interpretable image classification." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
- [IntArc, Concepts, Interventions] Espinosa Zarlenga, Mateo, et al. "Learning to receive help: Intervention-aware concept embedding models." Advances in Neural Information Processing Systems 36 (2024).
- [IntArch, Concepts, Interventions] Vandenhirtz, Moritz, et al. "Stochastic Concept Bottleneck Models." The Thirty-eighth Annual Conference on Neural Information Processing Systems.
- [IntArch, Prototypes] Ma, Chiyu, et al. "This looks like those: Illuminating prototypical concepts using multiple visualizations." Advances in Neural Information Processing Systems 36 (2023): 39212-39235.
- [Influence, Post-hoc] Bae, Juhan et al. "If Influence Functions are the Answer Then What is the Question?" Advances in Neural Information Processing Systems 35 (2022): 17953-17967.
- [Influence, Post-hoc, Issues] Basu, Samyadeep, Phil Pope, and Soheil Feizi. "Influence Functions in Deep Learning Are Fragile." International Conference on Learning Representations.
- [NeSy, Logic, Concepts] Barbiero, Pietro, et al. "Entropy-based logic explanations of neural networks." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. No. 6. 2022.
- [Counterfactuals, Posthoc] Mothilal, Ramaravind K., Amit Sharma, and Chenhao Tan. "Explaining machine learning classifiers through diverse counterfactual explanations." Proceedings of the 2020 conference on fairness, accountability, and transparency. 2020.
- [Counterfactuals, Post-hoc] Altmeyer, et al. "Faithful Model Explanations through Energy-Constrained Conformal Counterfactuals". AAAI (2024).
- [MechInt, LLMs] Wang, Kevin Ro, et al. "Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small." The Eleventh International Conference on Learning Representations.
- [MechInt, LLMs] Meng, Kevin, et al. "Locating and editing factual associations in GPT." Advances in Neural Information Processing Systems 35 (2022): 17359-17372.