rishav

I’m Rishav, an engineer at heart driven by the challenge of building machine learning systems that work reliably at scale. I’m currently at Mila, where my research focuses on real-time and explainable reinforcement learning. My long-term goal is to develop trustworthy systems that can learn efficiently from feedback, moving beyond today’s models that need millions of samples for even basic tasks.
Before Mila, I co-founded Offside, where I built and scaled the product to 100k users and raised $300k from top-tier VCs and angels. Earlier, I spent two enriching years at DFKI in Germany, developing real-time vision algorithms for precision farming—here’s a glimpse of that work: Spot Spraying for Precision Agriculture. I graduated from BITS Pilani in 2020 with a degree in Computer Science.
Research Interests
My current research focuses on Offline RL problems, particularly explainability and adaptive regularization. My broader research interests are in:
-
Offline RL: designing algorithms that can reliably learn from fixed datasets without unsafe trial-and-error exploration in sequential decision-making environments.
-
Mechanistic interpretability: uncovering the internal computational mechanisms underlying intelligent systems, developing frameworks to understand how credit assignment, planning, and abstract representation learning work at the circuit and feature level.
-
Reasoning and human-inspired learning: investigating complex reasoning tasks such as chain-of-thought inference, planning, and causal reasoning, while drawing on principles of human cognition to design algorithms that learn faster and generalize better.
-
Real-time decision making: addressing latency and stability challenges for deploying intelligent systems in high-frequency, safety-critical environments where interpretability is crucial for trust and reliability.
The overarching theme of my work is to move from brittle, opaque models toward principled algorithms that are mechanistically interpretable, sample-efficient, and reliable in real-world conditions.
Beyond Research
Outside of research, I enjoy reading about ancient civilizations, listening to classic rock, trekking, and strength training. I also write blogs reflecting on projects and life learnings.
News
Aug 20, 2025 | Behavior discovery and attribution for explainable RL accepted for TMLR 2025. |
---|---|
Aug 5, 2025 | I was at RLC, presenting “Behavioral Suite Analysis of Self-Supervised Learning in Atari” at RLVG workshop. |
Mar 20, 2025 | I’ve started a series of posts on CUDA programming, with the end goal of accelerating DQN using CUDA. The very first blog post is now live: link. |
Jan 22, 2025 | Handling delays in RL accepted at ICLR 2025. |
Nov 15, 2024 | KD-LoRA accepted at NeurIPS ENLSP Workshop. |
Jun 12, 2024 | Handling delays in real-time RL accepted at ICML Workshop and RLC Workshop. |
Jan 1, 2024 | back to research in AI, exploring reinforcement learning. |