Last but not least…

The QCISRFIFO.py code is finally up and running; this is where the standard GPIO ISR callback function puts each sample into a python Queue (FIFO) and the motion processing empties the queue periodically and does the motion processing.  The code now allows for easy setting of both the sampling rate and the number of samples per motion processing.  Testing reveals some interesting results.  With the IMU sampling rate set to 1kHz, a nominally 9 second flight (in the lab so no props are spinning) actually takes 15s.  So although the IMU was sampling at 1kHz, the GPIO code was missing some of the 50us pulse data ready interrupts from the IMU and the result sampling rate was only 608Hz i.e. 40% of samples were lost.  Dropping the IMU sampling rate to 500Hz resulted in an equivalent sampling rate of  441Hz or only 11% of samples were lost.  Dropping down to 333Hz IMU sampling led to 319Hz or 4% loss of samples.

For each I had motion processing happening every 10 samples, so at approximate 60Hz, 44Hz and 32Hz.   I think 32Hz motion processing is just about enough to maintain good stability, so I gave it a go.  Wobbly to say the least, and clearly in need of some PID tuning, but I think there is a chance this may prove to be the new way forwards.  Once I’ve done a bit more tuning I’ll try to get a video done, hopefully tomorrow.

Threads, ISRs and FIFOs

I have 4 strands of development on the go at the moment, all using different ways to collect data samples and run motion processing on them.  The aim in all cases it to capture every batch of samples available as efficiently as possible and run motion processing over them.

The Quadcopter.py code runs all the code serially, blocked waiting for each set of samples and when a batch of ten have been collected, motion processing runs using an average of that batch of ten.  This is the fastest version capturing more valid samples, despite being single threaded; some of this is due to the optimised GPIO library required for the blocking wait for the data ready interrupt.  However the serial operation means that motion processing needs to take less than one millisecond in order to ensure all samples are captured before the next batch of data is ready @ 1kHz.  Currently motion processing takes about 3ms and it’s hard to see how to trim it further.

The QCISR.py code runs sampling and motion in separate threads, and only requires the standard RPi.GPIO library.  The motion processing is the main thread; the data sampling thread is a an OS not python thread, used by the GPIO code to call into the sampling code each time a data ready interrupt occurs.  The expectation here was that because the sampling thread is always waiting for the next batch of data, none will ever be missed.  However it seems that the threading causes sufficient delays that in fact this code runs 10% slower.  It currently uses the same batching / averaging of data model as above.  The advantage here (if the threading didn’t have such an impact) is the the motion processing has ten milliseconds to run its course while the next batch of ten samples is being sampled on a separate thread.

The QCOSFIFO.py code runs sampling and motion processing in separate Python threads, setting up an OS FIFO to pass data from sampling to motion processing.  The motion thread sits waiting for the next batch on a select.select() call.  Currently, although data is being written, the select.select() never unblocks – possibly because the FIFO is intended as an IPC mechanism, but there is only a single process here.  I’ll need to move sampling to a child process to proceed on this further.

The QCIMUFIFO.py code tries to use the IMU FIFO, and the motion processing code periodically empties the FIFO and processes the data.  This is single threaded, but no samples should be lost as they are all queued up in the IMU FIFO.  The data pushed into the FIFO are batches of (ax, ay, az, gx, gy, gz) each taking 2 bytes every sampling period.  The code reads these 12 bytes, and breaks them down into their individual components.  This could be the perfect solution, were it not for the fact I2C errors cause the loss of data.  This results in the boundaries between (ax, ay, az, gx, gy, and gz) slip, and from then on, none of the samples can be trusted.  This seems to happen at least once per second, and once it does, the result is a violent crash.

For the moment, Quadcopter.py produces the best results; QCISR.py has the potential to be better on a multicore system using threads; QCOSFIFO.py would be a much better solution but requires splitting into two processes on a multi-core CPU; finally QCIMUFIFO.py is by far and away the best solution with single threaded operation with no possible data loss and reliable timing based on the IMU sampling rate, if it weren’t for the fact either the FIFO or the reads thereof are corrupted.

There’s one variant I haven’t tried yet, based upon QCISR.py – currently there’s just some shared data between the sampling and motion processing threads; if the motion processing takes longer than the batch of ten samples, then the data gets overwritten.  I presume this is what’s happening due to the overhead of using threads.  But if I use a python queue between the sampling and motion threads (effectively a thread safe FIFO), then data from sampling doesn’t get lost; the motion thread waits on the queue for the next batch(es) of data, empties and processes the averaged batches it has read.  This minimizes the processing done by the sampling thread (it doesn’t do the batching up), and IFF the sampling thread is outside of the python GIL, then I may be able to get all samples.  This is where I’m off to next.

I’ve included links to zipped-up code for each method in case you’d like to compare and contrast them.

Not quite golden but tasty nonetheless

I finally got the ISR callback code up and running whereby a ‘C’ thread collects sensor data and periodically kicks the main Python thread to do the motion processing on it, the advantage being the ISR can run in the background regardless of whether the motion processing was also running.

I had a slight hope that this might run faster despite the multiple threads on a single CPU because the ‘sampling’ thread doesn’t use the Python GIL and so should not (b)lock the motion processing.  But it seems the threading overhead outweighs this, and the code is marginally slower – 690 samples per second versus 788 samples per second for the single-threaded blocking wait_for_edge() code.  I’ll hang onto the ISR code though to see what the four cores of an A2 makes of it when it’s released – hopefully earlier than the end-of-year as is currently the best guess.

Empirical evidence (or wishful thinking?) suggests that the ISR method is less prone to I2C errors so I took both Phoebe and Chloe out for a flight to see if this had any real-world effect. They both flew the same regardless of whether I was using the blocking hardware interrupts or the ISR callback. There is the advantage (as expected) that the standard RPi.GPIO library can be used. My customized version is only necessary for the blocking hardware interrupt support.

I’ll probably stick with the blocking hardware interrupt version for the moment, but if you’d like to try the ISR code then click here to download the zip file.