Stats analysis

From the same run as the last two posts, I’ve been analysing the stats.

First the integrated gyro showing the yaw it detected:



This shows the unintentional clockwise (negative) yaw of about -10° shortly after take-off followed by the intentional anticlockwise yaw of +90° as Hermione headed off left.  I still need to have a play with the yaw rate PID to try to kill that initial -10° yaw.

More interesting, to me at least, is the motion processing:

FIFO stats

FIFO stats

The graph shows how often the FIFO was emptied.  In an unhindered world, the FIFO is emptied every 10ms (0.01s) and the PWM to the ESCs update.  Peaks above that means something else was taking up the spare time.  The FIFO overflows at just above 84ms (512 bytes total FIFO size / 12 bytes per IMU sample / 500Hz IMU sampling rate =  85.333ms), and the highest shown here is 38ms, well within safety limits.  I’m particularly delighted that the majority of the spikes are within the 10 to 20ms range – that strongly suggests the split phases of macro-block processing is working like a dream.

The 20ms norm means the PWM is updated at 50Hz.  Were the PWM consistently updated at less than 50Hz, it would really start to show in the stability of the flight.  But luckily it isn’t, meaning there’s probably just enough room to finally squeeze in compass and GPS processing.

In passing, it’s worth saying that such levels of stats would be impossible if I was using a microcontroller (Arduino etc) – this 11s flight logged 1.46MB of data to shared memory, and ultimately to SD card.  It logs both initial hard coded constants, and every dynamic variable for every cycle of motion processing – that means nothing is missing and it’s possible to diagnose any problem as long as the reader knows the code intimately.  Without these logs, it would have made it nigh on impossible for the ignorant me 4+ years ago to achieve what I have now.

* I rate myself as experienced having spent over 4 years on this.

Chicken poo tracking

If you look at yesterday’s video full screen, from top left to right, you can see a muddy patch and two chicken poos, the second poo of which is close to Hermione’s front left prop on take-off.  I was back out in the dark last night, tracking them down.  Here’s why:

Lateral tracking

Lateral tracking

Although the graph of camera lateral tracking and the Mavic video are almost facsimiles in direction, the scale is out; the graph shows the distance from take-off to landing to be about 1.7m whereas a tape measure from chicken poo #2 to the cotoneaster shrubbery landing point measures about 4.2m.  Given how accurate the direction is, I don’t think there’s any improvement needed for the macro-block processing – simply a scale factor change of ≈ 2.5.  I wish I knew more about the video compression method for generating macro-blocks to understand what this 2.5 represents – I don’t like the idea of adding an arbitrary scale of 2.5.

One further point from yesterday’s video, you can see she yaws clockwise by a few degrees on takeoff – Hermione’s always done this, and I think the problem is with her yaw rate PID needing more P and less I gain.  Something else for me to try next.

I had tried Zoe first as she’s more indestructible. However, her Pi0W can only cope with 400 x 400 pixels video, whereas Hermione’s Pi B 2+ can cope with 680 x 680 pixel videos  (and perhaps higher with the 5 phase motion processing) which seem to work well with the chicken trashed lawn.

Weird yaw behaviour

I’ve implemented the yaw code such that Hermione points in the direction that she should be travelling based upon the flight plan velocity vector.  She should take-off, then move left at 0.25 m/s for 6 seconds, while also rotating anti-clockwise by 90° to face the way she’s supposed to be travelling.  However, here’s what my Mavic & I saw:

My best guess is the camera lateral tracking which simply looks for peaks in macro-block after stashing them all in a dictionary indexed by the vectors.  This ignores yaw, which was fine up to now, as I’d set the yaw target to zero.  I think I need to add an extra stage which un-yaws each macro-block vector before adding them to the dictionary and looking for peaks.  That’s relatively easy code, involving tracking yaw between video frame, but costly as it adds an extra phase to unraw each MB vector, before dictionarying them and checking for peaks.  Time will tell.

Camera motion tracking code

I’ve reworked the code so that the video is collected in a daemonized process and fed into a shared memory FIFO. The main code for processing the camera output is now organised to be easily integrated into the quadcopter code. This’ll happen for Hermione once I’ve built her with a new PCB and 512MB memory A+. I suspect the processing overhead of this is very light, given that most of the video processing happens in the GPU, and the motion processing is some very simple averaging.

from __future__ import division
import os
import sys
import picamera
import select
import struct
import subprocess
import signal
import time

# - motion tracker based upon video-frame macro-blocks.  Note this happens per flight - 
#             nothing is carried between flights when this is merged with the quadcopter code.

# Video at 10fps. Each frame is 320 x 320 pixels.  Each macro-block is 16 x 16 pixels.  Due to an 
# extra column of macro-blocks (dunno why), that means each frame breaks down into 21 columns by 
# 20 rows = 420 macro-blocks, each of which is 4 bytes - 1 signed byte X, 1 signed byte Y and 2 unsigned
# bytes SAD (sum of absolute differences). 
def RecordVideo():
    print "Video process: started"
    with picamera.PiCamera() as camera:
        camera.resolution = (320, 320)
        camera.framerate = 10

        camera.start_recording('/dev/null', format='h264', motion_output="/dev/shm/motion_stream", quality=23)

            while True:
        except KeyboardInterrupt:
            except IOError:
    print "Video process: stopped"

# Check if I am the video process            
if len(sys.argv) > 1 and sys.argv[1] == "video":

# Setup a shared memory based data stream for the PiCamera video motion output

# Start up the video camera as a new process. Run it in its own process group so that Ctrl-C doesn't
# get through.
def Daemonize():
video = subprocess.Popen(["python", "", "video"], preexec_fn =  Daemonize)

# Off we go; set up the format for parsing a frame of macro blocks
format = '=' + 'bbH' * int(1680 / 4)

# Wait until we can open the FIFO from the video process
camera_installed = True
read_list = []
write_list = []
exception_list = []

if camera_installed:
    while True:
            py_fifo = open("/dev/shm/motion_stream", "rb")
    print "Main process: fifo opened"
    read_list = [py_fifo]

motion_data = open("motion_data.csv", "w")
motion_data.write("idx, idy, adx, ady, sad\n")

total_bytes = 0
frame_rate = 10
scale = 10000

# Per frame distance and velocity increments
ivx = 0.0
ivy = 0.0
idx = 0.0
idy = 0.0

# Per flight absolute distance and velocity integrals
avx = 0.0
avy = 0.0
adx = 0.0
ady = 0.0

start_time = time.time()

    while True:
        # The sleep time for select is defined by how many batches of data are sitting in the FIFO
        # compared to how many we want to process per motion processing (samples per motion)
        # timeout = (samples_per_motion - mpu6050.numFIFOSamles()) / sampling_rate
        # check for negative timeout.

        # Wait for the next whole frame
        read_out, write_out, exception_out =, write_list, exception_list)

        if camera_installed and len(read_out) != 0:
            # We have new data on the video FIFO; get it, and make sure it really is a whole frame of 
            # 21 columns x 20 rows x 4 bytes for macro block.
            frame =
            if len(frame) != 1680:
                print "ERROR: incomplete frame received"

            # Convert to byte, byte, ushort of x, y, sad
            iframe = struct.unpack(format, frame)

            # Iterate through the 21 x 20 macro blocks averaging the X and Y vectors of the frame based
            # upon the SAD (sum of absolute differences, lower is better).  
            ivx = 0.0
            ivy = 0.0
            sad = 0

            for ii in range(0, 420 * 3, 3):
                ivy += iframe[ii]
                ivx += iframe[ii + 1]
                sad += iframe[ii + 2]

            # Scale the macro block values to the speed increment in meters per second
            ivx /= scale
            ivy /= scale     

            # Use the frame rate to convert velocity increment to distance increment
            idt = 1 / frame_rate
            idx = ivx * idt
            idy = ivy * idt

            # Integrate (sum due to fixed frame rate) the increments to produce total distance and velocity.
            # Note that when producing the diagnostic earth frame distance, it's the increment in distance
            # that's rotated and added to the total ed*
            avx += ivx
            avy += ivy
            adx += idx
            ady += idy

            time_stamp = time.time() - start_time

            motion_data.write("%f, %f, %f, %f, %f, %d\n" % (time_stamp, idx, idy, adx, ady, sad))


# Stop the video process




Motion sensors

Other than a few test flights, I’ve now put Phoebe on hold so as not to break her before the CotswoldJam at the end of September.  I’ve bought her a new case so I can carry her around safely:

Phoebe's Peli 1600 case

Phoebe’s Peli 1600 case

So this morning, I picked up doing motion processing using the RaspiCam YUV macro-block output –, partly triggered by a pingback yesterday.  The motion processing (in a very simple form) is working fine, but it only produces a CSV file as output allowing me to see the data as a graph in Excel:

Zig-zagging for 10 seconds

Zig-zagging for 10 seconds

Ideally for testing, I’d want a small screen to show the direction / speed the motion processing is detecting.  And as she’s headless, I’d like to add a button to press so that I can do various tests on demand while she’s headless.  In one of the twists of fate, the post brought my new E-paper HAT.  Here’s it installed on Kitty:

Kitty's E-paper screen

Kitty’s E-paper screen

The camera is stuck underneath in one of the super-slim cases I found.

I now need to install the drivers, and update the code to draw variable length arrows for the orientation / speed vector.

After that, I need to add the ultrasonic range finder to get the distance from the ground.  I’ve got a few of these – they’ll do for Kitty, but with their 100kbps I2C baudrate, they’re not good for Phoebe who needs 400kbps I2C baudrate to capture all the sensor data.

Should keep me out of trouble for a while!

epoll and IMU interrupt interaction

epoll doesn’t differentiate between rising and falling edges – an edge is just an edge. The RPi.GPIO option to specify edge trigger is pointless given epoll doesn’t support it.  The RPi.GPIO code has code that calls epoll_wait() twice, thus reading the rising and falling edge when a button is pushed by a human.  Perfectly fine solution for “wait for button, then flash LED” type problems.

But for the IMU, the IMU interrupts and the epoll code need to be in sync about working together.  So I change both the HoG python- and my GPIO ‘C’ code.

  • EPOLLONESHOT detects an edge and then stops watching meaning there’s no backlog of interrupts building up while the python code is processing the corresponding sensor data.
  • Don’t call epoll_wait() twice to capture both rising and falling edge of a button – it will block permanently second time round with EPOLLONESHOT
  • The MPU6050 is started prior to enabling epoll otherwise epoll blocks waiting for interrupts that IMU has not been configured to send yet
  • Probably better to ask the IMU to clear the interrupt only once the data registers have been read – this then means epoll will not be watching for interrupts at the point there is a falling edge.
  • Set pull down on the interrupt GPIO pin.

This is what the ‘scope showed as a result:

Latching ONESHOT

Latching ONESHOT

The rising edge is triggered by the IMU when new data is ready to be read.  The falling edge is when the python code reads that data over I2C causing the IMU to drop the interrupt pin.

The screen spans 20ms, with a pulse every 2ms.  Hence there should be 10 rising edges, but if you count, there are only 9.  The wide pulse in the middle took more than 2ms between raising the interrupt and the data being read: a sample was lost.  I didn’t have to take lots of screen shots to capture this; this was the first screen shot I took.  The code is set to do motion processing every 5 reads, and I presume that’s the cause of the longer pulse; capturing a sample and doing motion processing takes more than 2ms.  Any screen shot will contain at least one wider pulse like this.

Overall, that’s pretty good news: the IMU interrupt, and the GPIO and HoG code are working well together.  I clearly need to reduce the time motion processing takes – and it looks like the reduction is relatively small.  Also that explains the difference in flight times measured based in interrupt- and time.time(): the HoG code reads only 5 out of 6 samples, so code relying on interrupt timing appears to take less time than it actually does (5 x 2ms < 12ms).

P.S. I’m assuming the mid-width pulse are due to Linux scheduling of my code.  That’s no problem as it’s not causing loss of samples – only the motion processing pulse is taking more than 2ms.

Last but not least…

The code is finally up and running; this is where the standard GPIO ISR callback function puts each sample into a python Queue (FIFO) and the motion processing empties the queue periodically and does the motion processing.  The code now allows for easy setting of both the sampling rate and the number of samples per motion processing.  Testing reveals some interesting results.  With the IMU sampling rate set to 1kHz, a nominally 9 second flight (in the lab so no props are spinning) actually takes 15s.  So although the IMU was sampling at 1kHz, the GPIO code was missing some of the 50us pulse data ready interrupts from the IMU and the result sampling rate was only 608Hz i.e. 40% of samples were lost.  Dropping the IMU sampling rate to 500Hz resulted in an equivalent sampling rate of  441Hz or only 11% of samples were lost.  Dropping down to 333Hz IMU sampling led to 319Hz or 4% loss of samples.

For each I had motion processing happening every 10 samples, so at approximate 60Hz, 44Hz and 32Hz.   I think 32Hz motion processing is just about enough to maintain good stability, so I gave it a go.  Wobbly to say the least, and clearly in need of some PID tuning, but I think there is a chance this may prove to be the new way forwards.  Once I’ve done a bit more tuning I’ll try to get a video done, hopefully tomorrow.

Threads, ISRs and FIFOs

I have 4 strands of development on the go at the moment, all using different ways to collect data samples and run motion processing on them.  The aim in all cases it to capture every batch of samples available as efficiently as possible and run motion processing over them.

The code runs all the code serially, blocked waiting for each set of samples and when a batch of ten have been collected, motion processing runs using an average of that batch of ten.  This is the fastest version capturing more valid samples, despite being single threaded; some of this is due to the optimised GPIO library required for the blocking wait for the data ready interrupt.  However the serial operation means that motion processing needs to take less than one millisecond in order to ensure all samples are captured before the next batch of data is ready @ 1kHz.  Currently motion processing takes about 3ms and it’s hard to see how to trim it further.

The code runs sampling and motion in separate threads, and only requires the standard RPi.GPIO library.  The motion processing is the main thread; the data sampling thread is a an OS not python thread, used by the GPIO code to call into the sampling code each time a data ready interrupt occurs.  The expectation here was that because the sampling thread is always waiting for the next batch of data, none will ever be missed.  However it seems that the threading causes sufficient delays that in fact this code runs 10% slower.  It currently uses the same batching / averaging of data model as above.  The advantage here (if the threading didn’t have such an impact) is the the motion processing has ten milliseconds to run its course while the next batch of ten samples is being sampled on a separate thread.

The code runs sampling and motion processing in separate Python threads, setting up an OS FIFO to pass data from sampling to motion processing.  The motion thread sits waiting for the next batch on a call.  Currently, although data is being written, the never unblocks – possibly because the FIFO is intended as an IPC mechanism, but there is only a single process here.  I’ll need to move sampling to a child process to proceed on this further.

The code tries to use the IMU FIFO, and the motion processing code periodically empties the FIFO and processes the data.  This is single threaded, but no samples should be lost as they are all queued up in the IMU FIFO.  The data pushed into the FIFO are batches of (ax, ay, az, gx, gy, gz) each taking 2 bytes every sampling period.  The code reads these 12 bytes, and breaks them down into their individual components.  This could be the perfect solution, were it not for the fact I2C errors cause the loss of data.  This results in the boundaries between (ax, ay, az, gx, gy, and gz) slip, and from then on, none of the samples can be trusted.  This seems to happen at least once per second, and once it does, the result is a violent crash.

For the moment, produces the best results; has the potential to be better on a multicore system using threads; would be a much better solution but requires splitting into two processes on a multi-core CPU; finally is by far and away the best solution with single threaded operation with no possible data loss and reliable timing based on the IMU sampling rate, if it weren’t for the fact either the FIFO or the reads thereof are corrupted.

There’s one variant I haven’t tried yet, based upon – currently there’s just some shared data between the sampling and motion processing threads; if the motion processing takes longer than the batch of ten samples, then the data gets overwritten.  I presume this is what’s happening due to the overhead of using threads.  But if I use a python queue between the sampling and motion threads (effectively a thread safe FIFO), then data from sampling doesn’t get lost; the motion thread waits on the queue for the next batch(es) of data, empties and processes the averaged batches it has read.  This minimizes the processing done by the sampling thread (it doesn’t do the batching up), and IFF the sampling thread is outside of the python GIL, then I may be able to get all samples.  This is where I’m off to next.

I’ve included links to zipped-up code for each method in case you’d like to compare and contrast them.