rishav
I’m Rishav, an engineer at heart driven by the challenge of building machine learning systems that work reliably at scale. I’m currently at Mila, where my research focuses on real-time and explainable reinforcement learning. My long-term goal is to develop trustworthy systems that can learn efficiently from feedback, moving beyond today’s models that need millions of samples for even basic tasks.
Before Mila, I co-founded Offside, where I built and scaled the product to 100k users. It was a great experience trying to build something from ground up and also exposed me to areas (marketing, GTM, UX) which I might not have been able to get into given my tech heavy background. Even before that, I ~2 years at DFKI in Germany, developing real-time vision algorithms for precision farming—here’s a glimpse of that work: Spot Spraying for Precision Agriculture. I graduated from BITS Pilani in 2020 with a degree in Computer Science.
Research Interests
My current research at Mila has focused on Offline RL, particularly explainability and adaptive regularization. Going ahead and given my long term goals, my broad research interests (long term) are in:
-
Offline RL: designing algorithms that can reliably learn from fixed datasets without unsafe trial-and-error exploration in sequential decision-making environments.
-
Mechanistic interpretability: uncovering the internal computational mechanisms underlying intelligent systems, developing frameworks to understand how credit assignment, planning, and abstract representation learning work at the circuit and feature level. this perhaps is extremely important since very large models are getting deployed at immense pace and scale while mechanistic understanding is still nascent and limited to small models.
-
Reasoning and human-inspired learning: investigating complex reasoning tasks such as chain-of-thought inference, planning, and causal reasoning, while drawing on principles of human cognition to design algorithms that learn faster and generalize better.
-
Real-time decision making: addressing latency and stability challenges for deploying intelligent systems in high-frequency, safety-critical environments where interpretability is crucial for trust and reliability.
In near term, I want to focus on offline RL and mechanistic interpretability, addressing both the how to learn safely and how to understand what was learned aspects of deploying decision-making systems. The overarching theme of my work is to move from brittle, opaque models toward principled algorithms that are mechanistically interpretable, sample-efficient, and reliable in real-world conditions.
Beyond Research
Outside of research, I enjoy reading about ancient civilizations, listening to classic rock, trekking, and strength training. I also write blogs reflecting on projects and life learnings.
News
| Oct 18, 2025 | Wrote a blog while learning distributed training with jax, wrote about the main confusions I got into (might as well be common!). Have a look: https://rish-av.github.io/blog/2025/jax_distributed/. |
|---|---|
| Aug 20, 2025 | Behavior discovery and attribution for explainable RL accepted for TMLR 2025. |
| Aug 5, 2025 | I was at RLC, presenting “Behavioral Suite Analysis of Self-Supervised Learning in Atari” at RLVG workshop. |
| Jun 20, 2025 | Our blog on Real-time RL is up on Mila, check it out: Real‑time Reinforcement Learning — Mila. |
| Mar 20, 2025 | I’ve started a series of posts on CUDA programming, with the end goal of accelerating DQN using CUDA. The very first blog post is now live: link. |
| Jan 22, 2025 | Handling delays in RL accepted at ICLR 2025. |
| Nov 15, 2024 | KD-LoRA accepted at NeurIPS ENLSP Workshop. |