skip to primary navigationskip to content
 

Course pages 2024–25

Explainable Artificial Intelligence

Slides


Student Presentations

Student presentations of papers should be at most 15 minutes long followed by Q&A.

Here is the form to submit your questions for every student presentations session.


Mini Project Papers

Here is the form to submit your preferred and ranked top 5 papers for your mini project.

  1. [Empirical, Concepts] Fong, Ruth, and Andrea Vedaldi. "Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
  2. [Perturbation, Post-hoc] Singla, Sumedha, et al. "Explanation by Progressive Exaggeration." International Conference on Learning Representations.
  3. [Saliency, Post-hoc, Adversarial] Dombrowski, Ann-Kathrin, et al. "Explanations can be manipulated and geometry is to blame." Advances in neural information processing systems 32 (2019).
  4. [Saliency Application, Generalisation] Kim, Jang-Hyun, Wonho Choo, and Hyun Oh Song. "Puzzle mix: Exploiting saliency and local statistics for optimal mixup." International conference on machine learning. PMLR, 2020.
  5. [Saliency Application, Fairness] Asgari, Saeid, et al. "Masktune: Mitigating spurious correlations by forcing to explore." Advances in Neural Information Processing Systems 35 (2022): 23284-23296.
  6. [Concepts, Post-hoc] Crabbé, Jonathan, and Mihaela van der Schaar. "Concept activation regions: A generalized framework for concept-based explanations." Advances in Neural Information Processing Systems 35 (2022): 2590-2607.
  7. [IntArch, Trees] Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez: "Beyond Sparsity: Tree Regularization of Deep Models for Interpretability." AAAI2018: 1670-1678.
  8. [IntArch, RepLearn] Alvarez Melis, David, and Tommi Jaakkola. "Towards robust interpretability with self-explaining neural networks." Advances in neural information processing systems 31 (2018).
  9. [InterArch, Concepts, Normalisation] Chen et al. "Concept Whitening for Interpretable Image Recognition". Nature Machine Intelligence (2020).
  10. [IntArch, Concepts, Unsup, LLMs] Yang, Yue, et al. "Language in a bottle: Language model guided concept bottlenecks for interpretable image classification." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
  11. [IntArc, Concepts, Interventions] Espinosa Zarlenga, Mateo, et al. "Learning to receive help: Intervention-aware concept embedding models." Advances in Neural Information Processing Systems 36 (2024).
  12. [IntArch, Concepts, Interventions] Vandenhirtz, Moritz, et al. "Stochastic Concept Bottleneck Models." The Thirty-eighth Annual Conference on Neural Information Processing Systems.
  13. [IntArch, Prototypes] Ma, Chiyu, et al. "This looks like those: Illuminating prototypical concepts using multiple visualizations." Advances in Neural Information Processing Systems 36 (2023): 39212-39235.
  14. [Influence, Post-hoc] Bae, Juhan et al. "If Influence Functions are the Answer Then What is the Question?" Advances in Neural Information Processing Systems 35 (2022): 17953-17967.
  15. [Influence, Post-hoc, Issues] Basu, Samyadeep, Phil Pope, and Soheil Feizi. "Influence Functions in Deep Learning Are Fragile." International Conference on Learning Representations.
  16. [NeSy, Logic, Concepts] Barbiero, Pietro, et al. "Entropy-based logic explanations of neural networks." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. No. 6. 2022.
  17. [Counterfactuals, Posthoc] Mothilal, Ramaravind K., Amit Sharma, and Chenhao Tan. "Explaining machine learning classifiers through diverse counterfactual explanations." Proceedings of the 2020 conference on fairness, accountability, and transparency. 2020.
  18. [Counterfactuals, Post-hoc] Altmeyer, et al. "Faithful Model Explanations through Energy-Constrained Conformal Counterfactuals". AAAI (2024).
  19. [MechInt, LLMs] Wang, Kevin Ro, et al. "Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small." The Eleventh International Conference on Learning Representations.
  20. [MechInt, LLMs] Meng, Kevin, et al. "Locating and editing factual associations in GPT." Advances in Neural Information Processing Systems 35 (2022): 17359-17372.