Prajwal Avhad

Currently a Robotist at NeoManav Robotics, working on manipulation, vision-language-action models, diffusion policies, and data pipelines.

Previously, I was a Summer Research Intern at the IIT Kanpur Helicopter Lab and a Flight Controls Intern at Aspera Industries.

Profile
> working on: manipulation, VLAs, diffusion policies, data pipelines > exploring: humanoids, traj opt, rl, sim2real > using: debian, nvim > tinkering with: robots, codebases and circuits
scroll for publications

Publications

new RSS 2026 Workshop on Semantics for Robotics (SemRob) -- Poster
IMBench: A Benchmark for Intuitive Robotic Manipulation
Anurag Maurya, Sukhvansh Jain, Prajwal Avhad, Gautham Balachandran, Ziyi Zhou, Atharva Kshirsagar, Satyam Singh, Bowen Li, Rishabh Mukund, Ritul Singh, Jatin Vira, Suvonil Chatterjee, Devesh K. Jha
Humans combine reasoning and motor control to solve complex manipulation tasks under diverse constraints. They build an understanding of the physical world that helps them convert reasoning into actions and quickly adapt to new scenes, tasks, and rules. We refer to this capability as intuitive manipulation. Existing benchmarks fail to capture this integration: they evaluate physical reasoning in isolation from execution, or measure policy performance without requiring explicit reasoning. We introduce IMBench, a benchmark designed to evaluate intuitive manipulation as an integrated capability spanning perception, physical reasoning, action generation, and iterative execution. Our tasks require models to infer task-relevant physical structure and generate feasible action sequences under explicit constraints, including contact-rich manipulation, tool use, and multi-stage dependencies. We introduce a benchmark of 35 tasks, 14K filtered trajectories, and scalable tools for generating diverse scenarios.
Edge Slide Pendulum Grasp Cube Toss Domino Single Seesaw Balance Tool Retrieve Cup Inversion Shape Stack
Edge Slide

Projects

Foundation Models, 3D Vision
  • Reproduced a training-free zero-shot 6D pose estimation pipeline by integrating GeDi, DINOv2, and SAM2 on an RTX 4090.
  • Implemented multi-view feature aggregation by back-projecting dense DINOv2 features from 6 viewpoints onto 3D point clouds.
  • Reference paper - FreeZe: Training-free zero-shot 6D pose estimation
Julia, Computational Geometry
LLMs, RAG
  • Built a RAG system for interactive chat with BeagleBoard docs using Qwen2.5-Instruct.
  • Optimized accuracy and efficiency by benchmarking against DeepSeek-R1 models.
Controls, Embedded Systems
  • Designed an LQR controller for a two-wheeled balancer using state-space feedback.
  • Implemented real-time control on ESP-32 with FreeRTOS and MPU6050 feedback.
  • Simulated in MATLAB to optimize gain matrices for stabilization.