Robot Planning and Situation Handling with Active Perception
Anonymous Authors — Under Review

Two examples of unforeseen situations during action execution. On the left, the robot attempted to navigate through a doorway — it was expected that the door would be fully open and passable, while the door was only half-open. On the right, the robot is cutting a lemon — it was expected that both lemon halves would remain in the plate, while one half had fallen outside.
Abstract
Current robots are capable of computing plans to accomplish complex tasks. However, real-world environments are inherently open and dynamic, and unforeseen situations frequently arise during plan execution, such as jamming doors and fallen objects on the floor. These situations may result from the robot's own action failures or from external disturbances, such as human activities. Detecting and handling such execution-time situations remains a significant challenge, limiting those robots' ability to achieve long-term autonomy. In this paper, we develop a planning and situation-handling framework, called VAP-TAMP, that enables robots to actively perceive and address unforeseen situations during plan execution. VAP-TAMP leverages action knowledge to strategically prompt vision-language models for active view selection and situation assessment, while constructing and reasoning over scene graphs for integrated task and motion planning. We evaluated VAP-TAMP using service tasks in simulation and on a mobile manipulation platform.
Method Overview
VAP-TAMP maintains a scene graph as the symbolic world state and uses VLMs for plan monitoring. When the current view is insufficient for situation assessment, the robot actively selects new viewpoints. Once the situation is identified, VAP-TAMP updates the scene graph and replans using integrated task and motion planning.

VAP-TAMP System Overview. Given RGB-D observations and a natural language goal, VAP-TAMP builds a 3D point cloud, extracts an instance memory and scene graph, and translates to PDDL for planning. During execution, preconditions are verified before each action and effects after, with uncertainty detection triggering active perception when needed.
TAMP Planner

VAP-TAMP integrates robot perception with domain knowledge by formulating action preconditions and effects as VQA queries. Before executing the next action (left), preconditions are verified; after execution (right), expected effects are monitored. Violations trigger replanning.
Active Perception

Active perception resolving visual ambiguity during predicate verification. (a) Initial observation with inconsistent VLM responses. (b) VLM suggests a better viewing direction. (c) Improved close-up view yields consistent responses, enabling confident verification and plan continuation.
Project Videos
System Overview: A complete walkthrough of the VAP-TAMP framework — watch this first for a full understanding of our approach.
Halve a Lemon: The robot navigates to find a knife, returns to the table to halve a lemon, encounters an adversarial disturbance, and recovers to complete the task.
Collect Firewood: The robot navigates to collect firewood and encounters a half-open door, detects the situation, and recovers to complete the task.
Results
We evaluated VAP-TAMP on service tasks in both simulation and on a real mobile manipulation platform. VAP-TAMP consistently outperforms baselines by actively detecting and recovering from unforeseen situations during plan execution.

Success rates by task. VAP-TAMP maintains consistent performance across all tasks, while baselines show larger variance.
VAP-TAMP achieves the highest success rate on every task, with particularly large gains on tasks involving navigation through doors and multi-step object rearrangement where unforeseen situations are most frequent.

Success rate vs. execution time.
Points closer to the top-left indicate better overall performance. VAP-TAMP achieves both the highest success rate and competitive execution time, demonstrating that active perception adds minimal overhead while substantially improving reliability.

Distribution of situations across tasks.
Flows connect tasks (left) to situation types (right), with occurrence counts. Navigation and pick-and-place tasks trigger the widest variety of situations, while door-related situations are the most frequent across all tasks.
BibTeX
If you find this work useful, please cite:
@inproceedings{anonymous2026vaptamp,
title = {Robot Planning and Situation Handling with Active Perception},
author = {Anonymous},
booktitle = {Proceedings of the IEEE/RSJ International Conference
on Intelligent Robots and Systems (IROS)},
year = {2026},
note = {Anonymous submission}
}