[Empirical, Concepts] Fong, Ruth, and Andrea Vedaldi. "Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[Perturbation, Post-hoc] Singla, Sumedha, et al. "Explanation by Progressive Exaggeration." International Conference on Learning Representations.
[Saliency, Post-hoc, Adversarial] Dombrowski, Ann-Kathrin, et al. "Explanations can be manipulated and geometry is to blame." Advances in neural information processing systems 32 (2019).
[Saliency Application, Generalisation] Kim, Jang-Hyun, Wonho Choo, and Hyun Oh Song. "Puzzle mix: Exploiting saliency and local statistics for optimal mixup." International conference on machine learning. PMLR, 2020.
[Saliency Application, Fairness] Asgari, Saeid, et al. "Masktune: Mitigating spurious correlations by forcing to explore." Advances in Neural Information Processing Systems 35 (2022): 23284-23296.
[Concepts, Post-hoc] Crabbé, Jonathan, and Mihaela van der Schaar. "Concept activation regions: A generalized framework for concept-based explanations." Advances in Neural Information Processing Systems 35 (2022): 2590-2607.
[IntArch, Trees] Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez: "Beyond Sparsity: Tree Regularization of Deep Models for Interpretability." AAAI2018: 1670-1678.
[IntArch, RepLearn] Alvarez Melis, David, and Tommi Jaakkola. "Towards robust interpretability with self-explaining neural networks." Advances in neural information processing systems 31 (2018).
[InterArch, Concepts, Normalisation] Chen et al. "Concept Whitening for Interpretable Image Recognition". Nature Machine Intelligence (2020).
[IntArch, Concepts, Unsup, LLMs] Yang, Yue, et al. "Language in a bottle: Language model guided concept bottlenecks for interpretable image classification." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
[IntArc, Concepts, Interventions] Espinosa Zarlenga, Mateo, et al. "Learning to receive help: Intervention-aware concept embedding models." Advances in Neural Information Processing Systems 36 (2024).
[IntArch, Concepts, Interventions] Vandenhirtz, Moritz, et al. "Stochastic Concept Bottleneck Models." The Thirty-eighth Annual Conference on Neural Information Processing Systems.
[IntArch, Prototypes] Ma, Chiyu, et al. "This looks like those: Illuminating prototypical concepts using multiple visualizations." Advances in Neural Information Processing Systems 36 (2023): 39212-39235.
[Influence, Post-hoc] Bae, Juhan et al. "If Influence Functions are the Answer Then What is the Question?" Advances in Neural Information Processing Systems 35 (2022): 17953-17967.
[Influence, Post-hoc, Issues] Basu, Samyadeep, Phil Pope, and Soheil Feizi. "Influence Functions in Deep Learning Are Fragile." International Conference on Learning Representations.
[NeSy, Logic, Concepts] Barbiero, Pietro, et al. "Entropy-based logic explanations of neural networks." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. No. 6. 2022.
[Counterfactuals, Posthoc] Mothilal, Ramaravind K., Amit Sharma, and Chenhao Tan. "Explaining machine learning classifiers through diverse counterfactual explanations." Proceedings of the 2020 conference on fairness, accountability, and transparency. 2020.
[Counterfactuals, Post-hoc] Altmeyer, et al. "Faithful Model Explanations through Energy-Constrained Conformal Counterfactuals". AAAI (2024).
[MechInt, LLMs] Wang, Kevin Ro, et al. "Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small." The Eleventh International Conference on Learning Representations.
[MechInt, LLMs] Meng, Kevin, et al. "Locating and editing factual associations in GPT." Advances in Neural Information Processing Systems 35 (2022): 17359-17372.

Explainable Artificial Intelligence

Slides

Student Presentations

Mini Project Papers

Study at Cambridge

About the University

Research at Cambridge