Stereo Vision Update
As a follow up to the previous post. It was assumed that pixel locations of the object of interest will be provided for triangulation. However, for our use case an image frame will have multiple objects of interest within view, how would we then pair an object located within the left frame to its corresponding position in the right frame? In this post a simple brute force method is demonstrated as a proof of concept before any algorithmic optimizations are applied.
Prerequisites/Assumptions
The following technique would require/assume:
- Location of the object in both left and right frames
- Bounding boxes of said object in both frames
Which can be obtained from a trained neural net like YOLOv3.
Proof of concept
Starting with the position of each cone and its bounding box, a crop is taken from the original left image frame of the bounding box.
The crops are then compared to every other crop within the right image frame.
Mathematically there are multiple ways of calculating how similar two images are,
- Sum of absolute differences (SAD)
- Sum of squared differences (SSD)
- Normalized Cross-correlation (NCC)
As it can be seen, the above mathematical representation of similarity are in increasing computational complexity. Both SAD and SSD will provide a result within the range $[0, \infty]$, where 0 if the two inputs are exactly the same. NCC on the other hand will provide a bounded range of $[-1, 1]$ in which 0 represents no correlation between images, with 1 and -1 being the two images are identical or exactly inversed.
Further rather than using the raw frame matrix, using the mean normalized ($X’ = X - \overline{X}$) data, linear changes in colour intensities between two cameras can be accounted for.
By comparing every possible pair in the left and right frame, it is possible to generate a correlation table (Axis are object IDs of cones found in the left and right frame),
Next the two images are paired with a linear assignment solver, to solve the optimisation problem which results in the final pairing,
While the results may look promising currently with the proof of concept, further heuristics and optimisations are possible in order to both increase execution speed and accuracy such as only trying to pair using a k-nearest neighbour algorithm (KNN) and take into stereo setup geometry (Little to no vertical disparity on the y-axis) to reduce redundant or unlikely candidates.
Next Steps
- Real-world data testing
- Type 1 & 2 error robustness
- Pipeline integration