Currently the macros blocks are just averaged to extract lateral distance increment vectors between frames. Due to the fixed frame rate, these can produce velocity increments. Both can then be integrated to produce absolutes. But I suspect there’s even more information available.
Imagine videoing an image like this multi-coloured circle:
It’s pretty simple to see that if the camera moved laterally between two frames, it’d be pretty straight forward for the video compression algorithm to break the change down into a set of identical macro-block vectors, each showing the direction and distance the camera had shifted between frames. This is what I’m doing now by simply averaging the macro-blocks.
But imagine the camera doesn’t move laterally, but instead it rotates and the distance between camera and circle increases. By rotating each macro-block vector by the position it is in the frame compared to the center point and averaging the results, what results is a circumferential component representing yaw change, and an axial component representing height change.
I think the way to approach this is first to get the lateral movement by simply averaging the macro-block vectors as now; the yaw and height components will cancel themselves out.
Then by shifting the contents of the macro-block frame by the averaged lateral movement, the axis is brought to the center of the frame – some macro-blocks will be discarded to ensure the revised macro-block frame is square around the new center point.
Each of the macro-block vectors is then rotated according to the position in the new square frame.The angle of each macro-block in the frame is pretty easy to work out (e.g. a 4×4 square has rotation angles of 45, 135, 225 and 315 degrees, 9×9 square has blocks to be rotated by 0, 45, 90, 135, 180, 225, 270, 315 degrees), so now averaging the X and Y axis of these rotated macro-block vectors gives a measure of yaw and size change (height). I’d need to include distance from the center when averaging out these rotated blocks.
At a push, even pitch and roll could be obtained because they would distort the circle into an oval.
Yes, there’s calibration to do, and there’s a dependency on textured multicoloured surfaces, and the accuracy will be very dependent on frame size and rate. Nevertheless, in the perfect world, it should all just work(TM). How cool would that be to having the Raspberry Pi camera providing this level of information! No other sensors would be required except for a compass for orientation, GPS for location, and LiDAR for obstacle detection and avoidance. How cool would that be!
Anyway, enough dreaming for now, back to the real world!