publications
Asterisks indicate equal contributions.
2024
- Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language ModelsIn Preprint, 2024
- A Generative Framework to Bridge Data-driven Models and Scientific Theories in Language NeuroscienceIn Preprint, 2024
- Efficient Automated Circuit Discovery in Transformers using Contextual DecompositionIn Preprint, 2024
- Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-MakingIn The Twelfth International Conference on Learning Representations (ICLR), 2024
2023
- Explaining black box text modules in natural language with language modelsIn NeurIPS XAI in Action: Past, Present, and Future Applications, 2023