Prajwal Avhad

Publications

new RSS 2026 Workshop on Diffusion Models for RL (Diff4RL) -- Poster

When Life Doesn't Give You Q-Values: Diagnosing Q-Exploitation in Test-Time Guidance of Flow-Matching Policies

Prajwal Avhad

Paper PDF

Test-time gradient guidance steers a frozen flow-matching policy toward higher-value actions using the gradient of a learned critic, without retraining the policy. We study this approach on a frozen diffusion-transformer policy (ABC-DiT) for a simulated bimanual bottle-placement task, training an in-sample critic on the policy's own visual features with a per-chunk ground-truth progress reward. We find that the binding constraint on guidance is the quality of the critic's action gradient, not the guidance mechanism. The critic separates successful from failed episodes well, but its action-gradient signal collapses to chance at mid- and late-trajectory horizons. In closed-loop evaluation, clean-action gradient guidance (QGF) increases the critic's predicted value while leaving real task performance at the noise floor, a failure mode we term Q-exploitation. The noisy-action variant (QFQL) instead yields a small but statistically significant improvement that a magnitude-matched random perturbation does not reproduce, establishing that the benefit is specific to the critic-gradient direction. Yet this improvement arises while the guidance moves the critic's predicted value the least: the benefit of test-time guidance here is decoupled from value ascent. We argue that an action-sensitive critic is a prerequisite for gradient guidance to translate predicted value into real performance, and offer Q-exploitation as a concrete diagnostic for when it will not.

Architecture

new RSS 2026 Workshop on Semantics for Robotics (SemRob) -- Poster

IMBench: A Benchmark for Intuitive Robotic Manipulation

Anurag Maurya, Sukhvansh Jain, Prajwal Avhad, Gautham Balachandran, Ziyi Zhou, Atharva Kshirsagar, Satyam Singh, Bowen Li, Rishabh Mukund, Ritul Singh, Jatin Vira, Suvonil Chatterjee, Devesh K. Jha

Project Page arXiv

Humans combine reasoning and motor control to solve complex manipulation tasks under diverse constraints. They build an understanding of the physical world that helps them convert reasoning into actions and quickly adapt to new scenes, tasks, and rules. We refer to this capability as intuitive manipulation. Existing benchmarks fail to capture this integration: they evaluate physical reasoning in isolation from execution, or measure policy performance without requiring explicit reasoning. We introduce IMBench, a benchmark designed to evaluate intuitive manipulation as an integrated capability spanning perception, physical reasoning, action generation, and iterative execution. Our tasks require models to infer task-relevant physical structure and generate feasible action sequences under explicit constraints, including contact-rich manipulation, tool use, and multi-stage dependencies. We introduce a benchmark of 35 tasks, 14K filtered trajectories, and scalable tools for generating diverse scenarios.

Edge Slide

Projects

FreeZe-pipeline - Zero-Shot 6D Object Pose Estimation

November 2025

Foundation Models, 3D Vision

Reproduced a training-free zero-shot 6D pose estimation pipeline by integrating GeDi, DINOv2, and SAM2 on an RTX 4090.
Implemented multi-view feature aggregation by back-projecting dense DINOv2 features from 6 viewpoints onto 3D point clouds.
Reference paper - FreeZe: Training-free zero-shot 6D pose estimation

Ellipsotope Implementation

June 2025

Julia, Computational Geometry

Developed the Ellipsotope set type, generalizing zonotopes and ellipsoids with flexible constraints for reachability analysis.
Implemented ray-tracing based 2D visualization for boundary plotting of complex sets.
Reference paper - Ellipsotopes: Uniting Ellipsoids and Zonotopes for Reachability Analysis

BeagleMind: Chat with BeagleBoard Docs using RAG

Feb 2025

LLMs, RAG

Built a RAG system for interactive chat with BeagleBoard docs using Qwen2.5-Instruct.
Optimized accuracy and efficiency by benchmarking against DeepSeek-R1 models.

2-Wheeled Balancer using LQR Control

June 2024 – Oct 2024

Controls, Embedded Systems

Designed an LQR controller for a two-wheeled balancer using state-space feedback.
Implemented real-time control on ESP-32 with FreeRTOS and MPU6050 feedback.
Simulated in MATLAB to optimize gain matrices for stabilization.