Subsystem Vision
What the Vision subsystem does
The Vision layer is responsible for turning raw RGB-D sensor streams into a concise Object Map that downstream planners and manipulators can use. Its end-to-end workflow is:
Capture The Camera node streams synchronized color and depth frames.
Clean & Normalize Camera Preprocessing applies lens undistortion, colour-balance and denoising filters in a reusable pipeline.
Perceive Object Detection locates plates, hands and obstacles in each frame at a steady 1 Hz, producing bounding-box hypotheses.
Localize Distance Estimation fuses those bounding boxes with depth pixels to compute metric XYZ coordinates, emitting a final cogar_msgs/ObjectMap.

Design Patterns
Strategy
In our Vision subsystem, Object Detection uses Strategy to swap between different detection implementations—whether a placeholder stub, a classical CV method or a deep-learning model—without touching the rest of the pipeline. This keeps detection logic extensible and decoupled from downstream consumers.
Observer
Camera Preprocessing subscribes to /camera and /depth and processes each frame as soon as it arrives. By reacting to new-image events rather than polling, we guarantee that every frame is handled exactly once and with minimal latency.
Adapter
Camera uses CvBridge to adapt ROS sensor_msgs/Image messages into OpenCV numpy arrays and back. This isolates ROS message formats from CV code, making sensor-to-image integration seamless and maintainable.
Decorator
Camera Preprocessing chains filters—undistort → colour-balance → 5×5 Gaussian blur—by wrapping each output in the next filter. We can add, remove or reorder filters without modifying the subscription logic.
Template Method
Distance Estimation provides a fixed flow—parse detection results, sample depth, back-project to XYZ, publish—while allowing future variants to customize parsing or projection without rewriting the overall pipeline.
Component roles
Camera - Publishes /camera (sensor_msgs/Image) and /depth (sensor_msgs/Image) at 1,280×720 @ 10 Hz. - Implements the Adapter pattern via CvBridge for ROS↔OpenCV conversions.
Camera Preprocessing - Subscribes to raw streams, applies an undistort + Gaussian blur + colour-balance pipeline (Decorator pattern), and republishes on /camera_processed and /depth_processed.
Object Detection - Listens to /camera_processed, applies a detection algorithm chosen by the Strategy pattern, and publishes vision_msgs/Detection2DArray on /object_hypotheses at 1 Hz.
Distance Estimation - Subscribes to /object_hypotheses and /depth_processed, uses a Template Method–style base class to parse, project via pinhole intrinsics and publish a cogar_msgs/ObjectMap on /object_map.
ROS interfaces & KPIs
Topic / Service |
Type |
KPI / Note |
---|---|---|
`/camera`, `/depth` |
sensor_msgs/Image |
10 Hz capture rate; end-to-end acquisition latency ≤ 200 ms |
`/camera_processed`, `/depth_processed` |
sensor_msgs/Image |
Preprocessing adds < 25 ms latency; depth completeness ≥ 99 % |
`/object_hypotheses` |
vision_msgs/Detection2DArray |
Detection mAP ≥ 0.75 (IoU 0.5) on plate set, published ≤ 1 s after frame |
`/object_map` |
cogar_msgs/ObjectMap |
Metric positions published ≤ 300 ms after detections |
Implementation modules
Detailed API docs for each Vision component:
Vision Components