Behaviour Discovery and Attribution for Explainable Reinforcement Learning

¹Mila - Quebec AI Institute ²University of Calgary ³McGill University ⁴University of Montreal ⁵CIFAR AI Chair

Abstract

Explaining the decisions made by reinforcement learning (RL) agents is critical for building trust and ensuring reliability in real-world applications. Traditional approaches to explainability often rely on saliency analysis, which can be limited in providing actionable insights. Recently, there has been growing interest in attributing RL decisions to specific trajectories within a dataset. However, these methods often generalize explanations to long trajectories, potentially involving multiple distinct behaviors. Often, providing multiple more fine-grained explanations would improve clarity. In this work, we propose a framework for behavior discovery and action attribution to behaviors in offline RL trajectories. Our method identifies meaningful behavioral segments, enabling more precise and granular explanations associated with high-level agent behaviors. This approach is adaptable across diverse environments with minimal modifications, offering a scalable and versatile solution for behavior discovery and attribution for explainable RL.

Contributions

Our main contributions include:

A novel framework for behavior discovery and action attribution in offline RL trajectories.
A transformer-based VQ-VAE for behavior discovery, which encodes state-action sequences, discretizes them via a codebook, and decodes them to predict future states.
A graph clustering module that partitions the graph build using learnt codebook vectors into subgraphs, each representing a "behavior".
An attribution module that assigns actions taken by a policy trained on the entire dataset to the discovered behaviors.

BibTeX

@misc{rishav2025behaviourdiscoveryattributionexplainable, title={Behaviour Discovery and Attribution for Explainable Reinforcement Learning}, author={Rishav Rishav and Somjit Nath and Vincent Michalski and Samira Ebrahimi Kahou}, year={2025}, eprint={2503.14973}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2503.14973}, }