OK, so this is weird…

When I first added the autopilot process, it would update the main process at 100Hz with the current distance vector target; the main process couldn’t quite keep up with what it was being fed, but it was close.  The down side was the video processing dropped rate through the floor, building up a big backlog, meaning there was a very late reaction to lateral drift.

So I changed the autopilot process to only send velocity vector targets; that meant autopilot sent an update to the main process every few seconds (i.e. ascent, hover, descent and stop updates) rather than 100 times a second for the distance increments; as a result, video processing was running at full speed again.

But when I turned on diagnostics, the main process can’t keep up with the autopilot despite the fact they are only send once every few seconds.  A print to screen the messages showed they were being sent correctly, but the main process’ select() didn’t pick them up: in a passive flight, it stayed at a fixed ascent velocity for ages – way beyond the point the autopilot prints indicated the hover, descent and stop messages had been sent .  Without diagnostics, the sending and receipt of the messages were absolutely in sync.  Throughout all this, the GPS and video processes’ data rates to the main process were low and worked perfectly.

The common factor between autopilot, GPS, video and diagnostics is that they use shared memory files to store / send their data to the main processor; having more than one with high demand (autopilot at 100Hz distance target or diagnostics at 100Hz) seemed to be the cause for one of the lower frequency shared memory sources simply to not be spotted as far as the main process’ select() was concerned.  I have no idea why this happens and that troubles me.

This useful link shows the tools to query shared memory usage stats.

df -k /dev/shm shows only 1% shared memory is used during a flight

Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 441384 4 441380 1% /dev/shm

ipcs -pm shows the processes owning the shared memory:

------ Shared Memory Creator/Last-op PIDs --------
shmid owner cpid lpid 
0 root 625 625 
32769 root 625 625 
65538 root 625 625 
98307 root 625 625 
131076 root 625 625

ps -eaf | grep python shows the processes in use by Hermione. Note that none of these’ process IDs are in the list of shared memory owners above:

root 609 599 0 15:43 pts/0 00:00:00 sudo python ./qc.py
root 613 609 12 15:43 pts/0 00:02:21 python ./qc.py
root 624 613 0 15:43 pts/0 00:00:03 python /home/pi/QCAPPlus.pyc GPS
root 717 613 1 16:00 pts/0 00:00:01 python /home/pi/QCAPPlus.pyc MOTION 800 800 10
root 730 613 14 16:01 pts/0 00:00:00 python /home/pi/QCAPPlus.pyc AUTOPILOT fp.csv 100

Oddly, it’s the gps daemon with the shared memory creator process ID:

gpsd 625 1 4 15:43 ? 00:01:00 /usr/sbin/gpsd -N /dev/ttyGPS

I’m not quite sure yet whether there’s anything wrong here.

I could just go ahead with object avoidance; the main process would only have diagnostics as it’s main high speed shared memory usage.  Autopilot can maintain the revised version of ony sending low frequency velocity vector target changes.  Autopilot would get high frequency input from the Sweep, but convert that to changes of low frequency velocity targets sent to the main process.  This way, main has only diagnostics, and autopilot only has sweep as fast inputs.  This is a speculative solution.  But I don’t like the idea of moving forward with an undiagnosed weird problem.

Zoe the Zero – Appendix – What next?

If you’ve reached this stage, you’ve caught up with me, and I can’t help you any more with details, but here’s a list of things I’d like to try.

  • video your flights – there’s support for the PiCam built in to the code (use -v option), but I’ve not run it for a long time now – this requires an A+ rather than zero to get access to the CSI connector – so Phoebe or Chloe rather than Zoe.
  • tweak the flight plan to add horizontal motion steps between a couple of hovers – I’ve done a single test on this, and it worked reasonably with no code changes.
  • add a simple remote control by using a os.select() call with 0 timeout on stdin inside the FlightPlan() code – use the arrow keys to direct the flight laterally during a long ‘hover’ phase.
  • The DroTek board has a barometer and compass too, so you could add height and orientation control, and to some extent a further input to calculating angles based upon the change in location of the earth’s north pole compared to the quadcopter frame – similar to the integrated gyro data but with an immovable reference point and no integration drift.
  • A true remote control requires some more significant work to the code:
    • remove the X and Y velocity PIDs
    • replace them with angle PIDs
    • the angles are derived from GetAbsoluteAngles() and integrating the gyro – they are both quadcopter frame values.
    • you probably want to discard the GPIO edge_detect_wait() code, and instead, use os.select() to listen on both the data ready interrupt rising edge, and the incoming RC messages – looks at the GPIO/source/event_gpio.c for details
  • If you’re sticking with autonomous flight, then a USB GPS is a good next step once you have the compass working to allow the flight plan to be specified as absolute directions at fixed speeds with the GPS providing long term low resolution backup to the less accurate but high resolution compass settings.
  • With a GPS in place, you can do something awesome like the Hexo+, which controls the quad through a smart phone, using the differences between the phone and quad GPSs, combined with predefined set of flight plans to make the quad follow and video you autonomously while you do cools things like skiing or mountain biking!

There’s one big problem with all of these however: a single CPU Raspberry Pi running interpreted CPython is near its performance limit already.  This means adding more sensors risks missing readings from the IMU.  The autonomy relies critically on catching every sample to ensure acceleration integrates accurately to velocity.

There are two solutions to this performance problem, as long term followers will know: move over to PyPy or wait patiently for the A2; both will provide the spare CPU cycles required for further processing, but both have significant problems:

  •  The GPIO and RPIO use the CPython API to product libraries that CPython can import; PyPy doesn’t use these, instead using CFFI for provide direct access to ‘C’ function calls from Python.  I’d need to pull out the ‘C’ code from GPIO and RPIO and put that into a CFFI framework.  Trouble is, I’m fairly ignorant on how this works, and the fact the GPIO and RPIO code isn’t mine makes this more complicated.  And my confidence is not that high that it will produce the performance enhancements I’m looking for.
  • The alternative is to swap Phoebe over to (the yet to be announced) 4-core 1GHz 512MB RAM A2.  This will allow me to split motion processing and sensor reading across cores, leaving a couple spare for other sensors.  The problem here though is that the release of the PiZero at best has deferred creation of an A2, and at worst, cancelled its development altogether.

So for the mo, I’m stuck again.

 

Last but not least…

The QCISRFIFO.py code is finally up and running; this is where the standard GPIO ISR callback function puts each sample into a python Queue (FIFO) and the motion processing empties the queue periodically and does the motion processing.  The code now allows for easy setting of both the sampling rate and the number of samples per motion processing.  Testing reveals some interesting results.  With the IMU sampling rate set to 1kHz, a nominally 9 second flight (in the lab so no props are spinning) actually takes 15s.  So although the IMU was sampling at 1kHz, the GPIO code was missing some of the 50us pulse data ready interrupts from the IMU and the result sampling rate was only 608Hz i.e. 40% of samples were lost.  Dropping the IMU sampling rate to 500Hz resulted in an equivalent sampling rate of  441Hz or only 11% of samples were lost.  Dropping down to 333Hz IMU sampling led to 319Hz or 4% loss of samples.

For each I had motion processing happening every 10 samples, so at approximate 60Hz, 44Hz and 32Hz.   I think 32Hz motion processing is just about enough to maintain good stability, so I gave it a go.  Wobbly to say the least, and clearly in need of some PID tuning, but I think there is a chance this may prove to be the new way forwards.  Once I’ve done a bit more tuning I’ll try to get a video done, hopefully tomorrow.

Threads, ISRs and FIFOs

I have 4 strands of development on the go at the moment, all using different ways to collect data samples and run motion processing on them.  The aim in all cases it to capture every batch of samples available as efficiently as possible and run motion processing over them.

The Quadcopter.py code runs all the code serially, blocked waiting for each set of samples and when a batch of ten have been collected, motion processing runs using an average of that batch of ten.  This is the fastest version capturing more valid samples, despite being single threaded; some of this is due to the optimised GPIO library required for the blocking wait for the data ready interrupt.  However the serial operation means that motion processing needs to take less than one millisecond in order to ensure all samples are captured before the next batch of data is ready @ 1kHz.  Currently motion processing takes about 3ms and it’s hard to see how to trim it further.

The QCISR.py code runs sampling and motion in separate threads, and only requires the standard RPi.GPIO library.  The motion processing is the main thread; the data sampling thread is a an OS not python thread, used by the GPIO code to call into the sampling code each time a data ready interrupt occurs.  The expectation here was that because the sampling thread is always waiting for the next batch of data, none will ever be missed.  However it seems that the threading causes sufficient delays that in fact this code runs 10% slower.  It currently uses the same batching / averaging of data model as above.  The advantage here (if the threading didn’t have such an impact) is the the motion processing has ten milliseconds to run its course while the next batch of ten samples is being sampled on a separate thread.

The QCOSFIFO.py code runs sampling and motion processing in separate Python threads, setting up an OS FIFO to pass data from sampling to motion processing.  The motion thread sits waiting for the next batch on a select.select() call.  Currently, although data is being written, the select.select() never unblocks – possibly because the FIFO is intended as an IPC mechanism, but there is only a single process here.  I’ll need to move sampling to a child process to proceed on this further.

The QCIMUFIFO.py code tries to use the IMU FIFO, and the motion processing code periodically empties the FIFO and processes the data.  This is single threaded, but no samples should be lost as they are all queued up in the IMU FIFO.  The data pushed into the FIFO are batches of (ax, ay, az, gx, gy, gz) each taking 2 bytes every sampling period.  The code reads these 12 bytes, and breaks them down into their individual components.  This could be the perfect solution, were it not for the fact I2C errors cause the loss of data.  This results in the boundaries between (ax, ay, az, gx, gy, and gz) slip, and from then on, none of the samples can be trusted.  This seems to happen at least once per second, and once it does, the result is a violent crash.

For the moment, Quadcopter.py produces the best results; QCISR.py has the potential to be better on a multicore system using threads; QCOSFIFO.py would be a much better solution but requires splitting into two processes on a multi-core CPU; finally QCIMUFIFO.py is by far and away the best solution with single threaded operation with no possible data loss and reliable timing based on the IMU sampling rate, if it weren’t for the fact either the FIFO or the reads thereof are corrupted.

There’s one variant I haven’t tried yet, based upon QCISR.py – currently there’s just some shared data between the sampling and motion processing threads; if the motion processing takes longer than the batch of ten samples, then the data gets overwritten.  I presume this is what’s happening due to the overhead of using threads.  But if I use a python queue between the sampling and motion threads (effectively a thread safe FIFO), then data from sampling doesn’t get lost; the motion thread waits on the queue for the next batch(es) of data, empties and processes the averaged batches it has read.  This minimizes the processing done by the sampling thread (it doesn’t do the batching up), and IFF the sampling thread is outside of the python GIL, then I may be able to get all samples.  This is where I’m off to next.

I’ve included links to zipped-up code for each method in case you’d like to compare and contrast them.

Not quite golden but tasty nonetheless

I finally got the ISR callback code up and running whereby a ‘C’ thread collects sensor data and periodically kicks the main Python thread to do the motion processing on it, the advantage being the ISR can run in the background regardless of whether the motion processing was also running.

I had a slight hope that this might run faster despite the multiple threads on a single CPU because the ‘sampling’ thread doesn’t use the Python GIL and so should not (b)lock the motion processing.  But it seems the threading overhead outweighs this, and the code is marginally slower – 690 samples per second versus 788 samples per second for the single-threaded blocking wait_for_edge() code.  I’ll hang onto the ISR code though to see what the four cores of an A2 makes of it when it’s released – hopefully earlier than the end-of-year as is currently the best guess.

Empirical evidence (or wishful thinking?) suggests that the ISR method is less prone to I2C errors so I took both Phoebe and Chloe out for a flight to see if this had any real-world effect. They both flew the same regardless of whether I was using the blocking hardware interrupts or the ISR callback. There is the advantage (as expected) that the standard RPi.GPIO library can be used. My customized version is only necessary for the blocking hardware interrupt support.

I’ll probably stick with the blocking hardware interrupt version for the moment, but if you’d like to try the ISR code then click here to download the zip file.

Out of low hanging fruit

This is a summary of the state of play and what options are available for where to go next.

There are two possible causes for the drift shown in Phoebe and Chloes’ videos.

  1. occasional I2C errors / data misreads – protective code shows this is about 0.01% of attempted reads and the code interpolates to remove significant errors making this irrelevant for such small flights
  2. the 3 data samples missed during motion processing – 23% of data is interpolated (3 missed reads in an elapsed 13ms) – that’s a lot of data missed!

Clearly, doing something about motion processing is necessary.

  1. Use the MPU-6050 FIFO: perfect solution iff, for some reason, the number of I2C read errors wasn’t so huge that the data read from the FIFO can’t be assumed to be bundles of (ax, ay, az, gx, gy, gz) every 1ms
  2. Speed up motion processing code to < 1ms
  3. Run motion processing in parallel with sensor sampling – 10 batches of samples, averaged = 10ms – much greater than the 3ms required for motion processing before next batch becomes available

But…

  1. requires an I2C fix
  2. requires moving to pypy and change GPIO / RPIO to CFFI
  3. requires mutlithreads / processes, but CPython + GIL means has to be pypy again or CPython with multi-processes on a multiprocessor machine – i.e. the Quadcore A2

There isn’t an easy option here until the A2 turns up.  I am prototyping option 3 on a spare B2 I have, but if this works, it cannot be deployed onto Phoebe or Chloe as a B2 is too big to fit between their top and bottom plates.

P.S. Sorry about the colo(u)r in both the videos – I’d switched the white balance to ‘shade’ and forgotten to switch it back to auto.  Normal service has now resumed.

 

Minions

Here’s Phoebe showing anything Chloe can do, she can do better.

Phoebe flies from Andy Baker on Vimeo.

I’ve upgraded her to T-motor MN3110 750kV motors, and I’ve upgraded them both to the Chinese equivalent of the T-motors props – £10 versus £80 for a set of 4 – they’re stronger and much cheaper so they’ll pay for Phoebe’s new motors in no time.

Once more, I think Phoebe and Chloe are both as good as they can be without a big performance boost (Raspberry Pi A2  or GPIO / RPIO changed to use CFFI for PyPy) to allow

  • separation of sensor sampling and motion processing into separate processes on separate processors
  • kitty(++) to provide laser or picamera motion tracking.

I know I said I’d be implementing kitty(++) on my A+ HoGs, but with a single CPU, anything kitty(++) does is to likely to steal CPU cycles from HoG and so more samples would get lost. I’ll keep working on kitty++ motion from the picamera codex macro-blocks as this has the lesser CPU impact due to its use of the GPU but I’m not convinced it’ll work on just an A+.

But first, I’ll try to understand CFFI to see how hard it would be to recompile the CPython RPIO / GPIO libraries and get better PyPy performance as a result.

 

Crass assumption

The new code I posted yesterday is based on the assumption that the motion processing code takes less than 1ms, so that no samples will be missed.  I was aware that the assumption is probably not true, but thought I really ought to check this morning; net is that it takes about 3.14ms and so 3 samples are missed.  That’s more than enough to skew the accelerometer integration resulting in duff velocities and drift

I couldn’t spot any areas of the code that could be trimmed significantly, so it’s back to working out why pypy runs some much slower than CPython – about 2.5 times based on the same test.

I will be sticking with this latest code as I believe its timing is still better than the time.time() version.  I just need to speed it up a little.  FYI the pypy performance data from their site suggests there should be more than a 6 fold performance improvement based upon the pypy version (2.2.1) used for the standard Raspian distribution; that’s more than enough.

I am still tinkering with kitty++ in the background, and have the macro-block data, but need to work out the best / correct way to interpret that.  But it’s blocked because for some reason kitty’s Rasperry Pi can’t be seen by other computers on my home network other than by IP address.  Just some DHCP faff I can’t be bother to deal with at the moment.

No go FIFO

The FIFO solution which should address all the problems of corrupted and lost data, along with accurate timings has failed.  There are multiple factors:

The MPU-6050 FIFO isn’t a real FIFO as I know it – the amount of data in the FIFO doesn’t decrement as you read from the FIFO.  That means the FIFO needs to be emptied and then reset.  And that means the data read needs to be stored in another FIFO in my code.  Also the enforced FIFO reset means potential loss of data if new data arrived in the small gap between emptying the FIFO and resetting it – and in testing, I’ve seen this happen.  And this is a real problem; each batch of sensor data is 12 bytes long (3 axes * 2 bytes * (accelerometer + gyro)).  Lose a byte or two of accelerometer data in the FIFO read / reset gap, and all of a sudden, data read from the FIFO as accelerometer readings now contains gyro readings.  In addition, the FIFO is read byte by byte.  That means 12 1 byte reads for a full batch of data compared to 1 14 byte read directly from the data registers.  The latter is a lot more efficient – in fact the 12 x 1 byte reads are so slow that the FIFO is filling up faster than the data can be read, and starts overflowing; yes I can reduce the sampling rate (and I did to 500Hz) and that improved matter but, to cap it all, there are still I2C bus errors which means FIFO data can still get lost, again shifting the data so the gyro data slips into what ought to accelerometer data readings.  Put together, the FIFO doesn’t stand a feline in the fire’s chance of working.

Which means I need to drop back to plan A – PyPy.  I think the problem here is the PyPy I2C code which is out of date in the Raspberry Pi Raspian distribution.  I’m hoping someone reading this blog can encourage the RPF to update the distribution to include the latest copy of PyPy and it’s I2C / smbus library – please 🙂

Until / unless that happens, I’m stuck again with Phoebe and Chloe all dressed up and nowhere to go.

FIFO

Thanks to a Raspberry Pi forum post about increasing performance and not losing data, I’m starting to seriously consider swapping to using the inbuilt FIFO for gathering data.

The plus side is no data is lost or corrupting, and as a result, there is no need for strict timing – that’s defined by the configured sampling rate.

The down side is it means I discard all performance code I wrote for catching the data ready interrupt, and reading the data registers – I’ve invested a lot of blood, sweat and tears to optimize that code 🙁

The FIFO would work roughly like this:

  • the data sampling rate would be set to 1kHz like now.
  • every 10ms, read the FIFO byte count register – the 10ms isn’t time critical, it’s simply to allow the FIFO to fill with roughly 10ms of data but this would be batches of ten of more sets of data
  • read the FIFO based upon (FIFO byte count % 12) – 12 bytes is the gyro and accelerometer readings fed into the FIFO
  • run motion processing over whatever size that batch of data is – timing is inferred from the sampling rate and the number of 12 byte batches read from the FIFO – the FIFO continues to fill in the background.
  • repeat

There will be some faff flushing the FIFO at the right points but otherwise, I don’t think the code changes will be hard, though they will be extensive.

This avoids all the major flaws I’m currently handling, providing RTOS quality timing for the data, with zero risk of sample misses or data corruption.  I’m certain it will all just work.  But I am mourning the loss of the need for the data interrupt code I spent time enhancing 🙁