Currently, getting lateral motion from a frame full of macro-blocks is very simplistic: find the average SAD value for a frame, and then only included those vectors whose SAD is lower.
I’m quite surprised this works as well as it does but I’m fairly sure it can be improved. There are four factors to the content of a frame of macro-blocks.
- yaw change: all macro-block vectors will circle around the centre of the frame
- height change: all macro-blocks vectors will point towards or away from the centre of the frame.
- lateral motion change: all macro-blocks vectors are pointing in the same direction in the frame.
- noise: the whole purpose of macro-blocks is simply to find the best matching blocks between two frame; doing this with a chess set (for example) could well have any block from the first frame matching any one of the 50% of the second frame.
Given a frame of macro-blocks, yaw increment between frames can found from the gyro, and thus be removed easily.
The same goes for height too derived from LiDAR.
That leaves either noise or a lateral vector. By then averaging these values out, we can pick the vectors that are similar to the distance / direction of the average vector. SAD doesn’t come into the matter.
This won’t be my first step however: that’s to work out why the height of the flight wasn’t anything like as stable as I’d been expecting.
A more contrasting rug in poorer lighting:
The gravel drive:
A concrete path to the back garden:
The rear lawn:
These were all the same run, with the rig held about 1m above the ground with me strolling at about 1m/s – both very much guesstimates.
None of the shots are in ideal conditions: they each suffer from one of more of low lighting conditions, high resolution, low contrast or low colour variation. Yet each shows a cluster in roughly the same place of between 10 and 20 vector shift in the positive X direction.
This bodes very well. Next step is a manual slog to correlate the SAD values with the clusters or see whether there’s a simple mathematical algorithm to average all the values biased by the proximity of the nearest macro-block neighbour?
I took my test rig for a walk over this rug in the lounge (red-point siamese cats for scale):
I captured a single frame of macroblocks as I walked forwards across the rug:
This cluster of the vectors corresponds to the spikes in this graph of the SADs:
Critically, the X axis cluster also corresponds to the walking direction.
I need (and should be able) to get two things out of this:
- a sense of scale: the cluster is around the (-12,-3) X.Y position. The test was carried out with me carrying the rig about 1m off the ground, and walking about 1m/s. Based on my previous speculative thoughts about the macro block units, this suggests the cluster X value of -12 should represent 0.43m i.e. (size of frame in meters) / (number of macro-blocks per frame) * macro-block vector:
(2 * tan (48.8 / 2) * height) / (400 / 16) * -12 = -0.43m
This is clearly wrong. My guess is I was walking at about 1m/s and the video is running at 20fps, so the movement portrayed by this set of vectors should be about 0.05m. Clearly there’s more digging required to understand difference between 0.43 and 0.05m.
- how to filter out the good vectors from the bad. A SAD biased average of all the vectors in this frame results in an overall vector of (-8.04,-2.78) compared to the visual on the cluster of about (-12,-3); this isn’t bad, but I’m sure it could be made better by discarding some of the vectors based on their SAD values. Again more digging required here.
Despite the further digging required in both cases, I’m pretty confident that things are going in the right direction.
Because I’m stuck waiting on the post, I’m very very bored. And when I’m bored, I get very frustrated until I have something to do. That something is looking at macro-blocks calibration versus calculation.
There’s 3 sides to this:
- What do the macro-block vectors actually mean? i.e. what are the units for the values? I’m currently assuming is the number of macro-blocks that a frame has moved base upon the fact that the values are only a single byte, so can only cover 255 pieces of movement; based on that assumption, the maximum frame size would be 255 x 16 (pixels per macro-block) = 4080 pixel maximum screen resolution which is plausible. Zoe is shooting to 400 x 400 resolution, so she should get ±25 as the output values. I need to test this theory..
- I need to work out how to use the SAD readings; the low the value, the more confidence there is that the shift in the macro-blocks is accurate. I need to look into this in more detail.
- Finally, I need to know the units of macro-blocks, and how to convert the values to meters. This is where I’ve made progress.
/ | \ |
/ | \ |
/ | \ h
/ | \ |
/ | \ |
/ | \ |
I’ve found out that the camera angle of view is 62.2 x 48.8 degrees for the V2 camera. As I’m videoing a square frame, the angle θ in the diagram is 48.8°.
d can be calculated
- in meters as 2 h tan(48.8 / 2) where h is also measured in meters
- in macro blocks as frame size (400) / macro block size (16) = 25
So 1 macro block is 2 h tan(48.8 / 2) / 25 = 0.03628960 x h meters. At one meter height, d works out as ≈0.907m which matches up with my testing.
Based upon the walk graphed in the previous post, the measured distanced in the garden, and a crude guesstimation that I was carrying the test rig about 1m off the ground as I walked the walk, the macro-block output is about 87,500 pixels per meter height per meter distance or
horizontal distance = height * macro-block output / 87500
For the moment, that’s more than accurate enough.
A few things left to do before this code can be used in Hermione
- Currently, the video and macro block processing run in separate threads, connected by a shared memory FIFO; ultimately this needs splitting into into two processes which could then run on separate CPUs should a multi-core A3 appear.
- There’s some work to integrate the select.select() waiting on the FIFO into the main quadcopter scheduling loop – it replaces the current time.sleep() with listening on the FIFO with a timeout of the time.sleep() value.
- At the gory details level, I need to make sure the video and IMU X and Y axes are aligned when the new PSBs arrive.
Shouldn’t take too long except for the minor detail I’m off to DisneyLand Paris next week 🙁
With a more careful test this morning, here’s what I got.
Video macro-block tracking
This is at least as good as the PX4FLOW, so that’s now shelved, and testing will continue with the Raspberry Pi Camera. I upgraded the camera to the new version for this test, as that what I’ll be installing on Hermione.
The cross of the diagonal and vertical should have happened lower so that the diagnonals to the right of the vertical overlapped – these are the exit from and reentry to the house from the garden. There are multiple possible reasons for this, and because this is now my code, I can play to resolve the offset; something I simply couldn’t do with PX4FLOW and its offsets.
Next step is to sort out the units, including which direction the camera X and Y are facing in comparison with how Hermione’s X and Y from the accelerometer and gyro are facing.
The Raspberry Pi camera of course:
RaspiCam Video motion blocks
A very similar walk around the garden as before, but running the Raspberry Pi camera, ground facing, videoing at 10 frames per second at 320 x 320 resolution, producing 16 x 16 macro-blocks per frame, which are averaged per frame and logged.
The macro blocks give the pixel shift between one frame and the next to help with the frame compression; I’m not sure whether the units are in pixels or macro blocks, but that’s simple to resolve. Combined with the height from the LEDDAR, and the focal length of the lens, it’ll be trivial to convert these readings to a distance in meters.
The results here are at least as good as the PX4FLOW, if not better, and the processing of the macro-blocks to distance is very lightweight.
This is definitely worth pursuing as it’s much more in keeping with how I want this to work. The PX4FLOW has served its purpose well in that with my understanding how it worked, it opened up how it could be replaced with the RPi Camera.
There are further bonuses too: because of the video fixed frame rate, the macro blocks are producing distance increments, whereas the PX4FLOW only produced velocities, and that means I can add in horizontal distance PIDs to kill drift and ensure the quad always hovers over the same spot. And even better, I’m no longer gated on the arrival of the new PCBs: these were required for X8 and I2C for the PX4FLOW; I’ll need them eventually for X8 but for now, the current PCB provides everything I need.
We are most definitely on a roll again!