Subsystem Vision

What the Vision subsystem does

The Vision layer is responsible for turning raw RGB-D sensor streams into a concise Object Map that downstream planners and manipulators can use. Its end-to-end workflow is:

  1. Capture The Camera node streams synchronized color and depth frames.

  2. Clean & Normalize Camera Preprocessing applies lens undistortion, colour-balance and denoising filters in a reusable pipeline.

  3. Perceive Object Detection locates plates, hands and obstacles in each frame at a steady 1 Hz, producing bounding-box hypotheses.

  4. Localize Distance Estimation fuses those bounding boxes with depth pixels to compute metric XYZ coordinates, emitting a final cogar_msgs/ObjectMap.

Component diagram of the Vision subsystem

Design Patterns

Strategy

In our Vision subsystem, Object Detection uses Strategy to swap between different detection implementations—whether a placeholder stub, a classical CV method or a deep-learning model—without touching the rest of the pipeline. This keeps detection logic extensible and decoupled from downstream consumers.

Observer

Camera Preprocessing subscribes to /camera and /depth and processes each frame as soon as it arrives. By reacting to new-image events rather than polling, we guarantee that every frame is handled exactly once and with minimal latency.

Adapter

Camera uses CvBridge to adapt ROS sensor_msgs/Image messages into OpenCV numpy arrays and back. This isolates ROS message formats from CV code, making sensor-to-image integration seamless and maintainable.

Decorator

Camera Preprocessing chains filters—undistort → colour-balance → 5×5 Gaussian blur—by wrapping each output in the next filter. We can add, remove or reorder filters without modifying the subscription logic.

Template Method

Distance Estimation provides a fixed flow—parse detection results, sample depth, back-project to XYZ, publish—while allowing future variants to customize parsing or projection without rewriting the overall pipeline.

Component roles

  • Camera - Publishes /camera (sensor_msgs/Image) and /depth (sensor_msgs/Image) at 1,280×720 @ 10 Hz. - Implements the Adapter pattern via CvBridge for ROS↔OpenCV conversions.

  • Camera Preprocessing - Subscribes to raw streams, applies an undistort + Gaussian blur + colour-balance pipeline (Decorator pattern), and republishes on /camera_processed and /depth_processed.

  • Object Detection - Listens to /camera_processed, applies a detection algorithm chosen by the Strategy pattern, and publishes vision_msgs/Detection2DArray on /object_hypotheses at 1 Hz.

  • Distance Estimation - Subscribes to /object_hypotheses and /depth_processed, uses a Template Method–style base class to parse, project via pinhole intrinsics and publish a cogar_msgs/ObjectMap on /object_map.

ROS interfaces & KPIs

Topic / Service

Type

KPI / Note

`/camera`, `/depth`

sensor_msgs/Image

10 Hz capture rate; end-to-end acquisition latency ≤ 200 ms

`/camera_processed`, `/depth_processed`

sensor_msgs/Image

Preprocessing adds < 25 ms latency; depth completeness ≥ 99 %

`/object_hypotheses`

vision_msgs/Detection2DArray

Detection mAP ≥ 0.75 (IoU 0.5) on plate set, published ≤ 1 s after frame

`/object_map`

cogar_msgs/ObjectMap

Metric positions published ≤ 300 ms after detections

Implementation modules

Detailed API docs for each Vision component: