LEDDAR testing results

I’ve got the LEDDAR One prototype working well using the minimalmodbus python library and fantastic customer support from LEDDAR themselves.

I’ve had to put in a nasty hack to kill off the Raspberry Pi console tied to the UART pins; for some reason I couldn’t stop it across reboot.

The pin layout is nice and tidy: RPi pins 4,6,8,10,12 (5v, GND, TX, RX, GPIO 18) connect to Leddar Pins 3,1,4,5,2 respectively – that’s a convenient 5 consecutive pins on RPi side to attach a standard connector.  Unfortunately I do need to run another batch of PCBs as there’s no access to these pins on my current PCB.

Once the PCBs are done, I’ll be adding the LEDDAR to Phoebe; her motion processing happens at ≈100Hz; the default for LEDDAR is 8.8Hz (it can be increased if necessary).  Each LEDDAR distance measurement also comes with an accurate timestamp.  This means I can differentiate the LEDDAR readings to produce a Z-axis velocity which can then be fused with the integrated accelerometer readings to provide long term vertical stability.  A simple complementary filter (like the one I already have for angles) will do the job nicely.

The LEDDAR also provides a data ready interrupt which I’ve connected to GPIO 18.  This means I don’t have to poll the serial port every motion loop; instead I can use the GPIO edge detection code (like the one used for FIFO overflow on the IMU) to let the motion code know when there’s new data on the serial port.

Together, this means adding the new function to the HoG code is trivial – I’ve done all the tricky bits before in different contexts.  The only gating factor is the new PCB requirement, and (as always) my overdraft limit!

My one warning if you are in the UK and interested in using LEDDAR: its price is very reasonable given the technology and its support is excellent; sadly the UK import duties and UPS fees added another 40% to the costs!  Ouch!  My advice is to get a friend in Canada to buy it, and send it to you by normal international post!

Here’s the code in case you’re interested:

#!/usr/bin/env python

from __future__ import division
import serial
import minimalmodbus
import os
import time
import RPi.GPIO as GPIO

def main():
    minimalmodbus.BAUDRATE=115200
    mmb = minimalmodbus.Instrument("/dev/ttyAMA0", 1, 'rtu')
    mmb.BAUDRATE=115200

    temp = mmb.read_register(22, 0, 4, True)
    print "temp: %f" % (temp / 256)

    time.sleep(0.1) # ARBITRARY SLEEP REQUIRED WHY?

    num_detections = mmb.read_register(23, 0, 4)
    print "num_detections: %d" % num_detections

    time_lss = mmb.read_register(20, 0, 4)
    time_mss = mmb.read_register(21, 0, 4)
    start_time = (time_mss << 16) + time_lss
 
    #----------------------------------------------------------------------------------
    # Create GPIO 18 as input, pull down
    #----------------------------------------------------------------------------------
    GPIO_INT = 18
    GPIO.setmode(GPIO.BCM)
    GPIO.setup(GPIO_INT, GPIO.IN, GPIO.PUD_UP)

    #----------------------------------------------------------------------------------
    # Clear the already existing interrupt by reading data from the modbus link
    #----------------------------------------------------------------------------------
    dist_leddar = mmb.read_register(24, 0, 4)

    for ii in range(10):

        #------------------------------------------------------------------------------
        # Add GPIO pin 18 rising edge interrupt and then get new data
        #------------------------------------------------------------------------------
        GPIO.wait_for_edge(GPIO_INT, GPIO.RISING)

        time_lss = mmb.read_register(20, 0, 4)
        time_mss = mmb.read_register(21, 0, 4)
        current_time = (time_mss << 16) + time_lss

        dist_leddar = mmb.read_register(24, 0, 4)

        print "Proximity @ %f: LEDDAR %f;" % ((current_time - start_time) / 1000, dist_leddar / 1000)

if __name__ == "__main__":
    os.system("systemctl stop serial-getty@ttyAMA0.service")
    main()

P.S. As expected above, I’ve pretty much coded the LEDDAR into Phoebe’s HoG in about half an hour. There is also a way to bodge the current PCB to get the LEDDAR wires connected to the 5V, TX, RX, GND and interrupt pins, so I could be installing the LEDDAR soon and start testing. I’ll let you know when the job’s done.

P.P.S. If you want to buy a LEDDAR One, it’s in stock at robotshop.

FIFO first flight

In a 10 minute slot between 40mph winds and torrential rain, the sun came out and I quickly grabbed a couple of outdoor flights with Zoe running the IMU FIFO code.  She passed the primary test by not drifting significantly which means further flights can move indoors.  She needs some gyro PID tuning – she’s flying with Phoebe’s defaults – she was see-sawing around the X (roll) axis.  She was running at 500Hz sampling with ESC updates at a nominal 100Hz, and she just about kept up with herself.

My hope is that once the Christmas chaos has died down, I can get that tuning done, and show her of on video to the world.

You’re ‘aving a laugh

I separated the data ready interrupt frequency (666Hz) from the IMU ADC sampling rate (500Hz) with the intention that this might prove better timing accuracy and therefore better angles and velocities.

In flight, timing was better, angle were probably better, but unexpectedly, vertical velocity was completely shot – both Phoebe and Chloe rose to about 4m, and their height increased during hover, and descended slowing during descent before dropping out of the sky from 3m on landing.  Phoebe partially snapped off her USB port, and after an hour trying to desolder and replace it, I gave up and ordered a new A+ arriving today.

In addition, it’s seems now like the 666Hz was just someone having a laugh* at my expense – the one flight I got yesterday was capturing data ready interrupts at 677Hz. This could be

  • 500Hz plus noise or
  • 1kHz with missed samples
  • a code bug

I checked the data ready interrupt frequency with my ‘scope when the new A+ arrived with the ADC sampling frequency set to 500Hz:

500Hz 50us data ready interrupt

500Hz 50us data ready interrupt

Given the ‘scope is saying the hardware interrupt is running at 500Hz, why does the code think it’s running at > 660Hz – clearly something in my customized GPIO library is causing double counting – probably not flushing the epoll fd after a read?


*Unrelated to the Quadcopter, but related to someone having a laugh at my expense, I ordered 2 new monitors yesterday to replace my work and home computers – more screen space for the large number of open apps at work and better full screen viewing of files, and full sRGB colour space rendering for editing photos at home.  I ordered them as a huge motivation to stop smoking.  I can’t afford them unless I give up smoking which costs me £300 a month.  Shortly after ordering, I was sorting out my work PC in preparation for the new monitor, and while the existing monitor was off its vesa wall-mount it smashed to the floor and would only show a white screen.  So stopping smoking is now mandatory, not optional!

Weird timing

Since I reconfigured the sampling rate to 500Hz from 1kHz, everything has been working much better.  I’ve increased flight times to 11.5s (1.5s warm-up, 2s take-off, 6 seconds hover and 2 seconds landing), and also been able to move to within 2m of both Chloe and Phoebe without risk of them slashing one of my arteries.  All great.  And all because there’s now enough time between samples for motion processing to take place, and hence now samples are lost.

The only thing that’s odd though is timing: all of HoG’s timing is based on the sampling rate: elapsed time = number of samples / sampling rate.

In the current flights I get 5760 samples = 11.52 seconds elapsed time as expected.

But if I wrap the flight code with a couple of time.time()’s to measure the actually flight time, it comes out at 8.64 seconds, suggesting that the sampling rate isn’t 500Hz as configured, but more like (11.52 / 8.64) * 500Hz = 666.666666666666… to 32 digit accuracy!  And that’s weird as it’s not possible to configure the IMU to sample at this rate – sampling rates are (1kHz / an integer).

The nearest thing I can find in the MPU-9250 data sheet section “4.4 Register 25 – Sample Rate Divider” is this:

Data should be sampled at or above sample rate; SMPLRT_DIV is only used for 1kHz internal sampling.

Perhaps by setting the ADC sample rate to 500Hz, the MPU-9250 also sets the data ready interrupt frequency to 666Hz to ensure the above rule is maintained.  There is no mention of 666Hz throughout the documentation; if my supposition is correct, then it’s a little poor to hide this fact to be inferred from a single sentence embedded in the depths of the very large register map document!


P.S. As well as affecting the length of flights, it also has an effect on velocities and angles which both use integration over time of the accelerometer and gyro respectively. The former isn’t significant in the grand scheme of things, but the latter probably is: merging angles from the accelerometer and integrated gyro will mean short term, the gyro angles are over-estimated. The ‘fix’ is simple, though again a bit of a hack as it needs to know through testing the difference in ADC sampling- and data ready interrupt frequencies.

Wot no drift?

No quite true, but nearly so.  Nothing to do with tuning as I’d previously suspected.  Everything to do with reading the data registers fast enough after the hardware data ready interrupt is detected.

You may remember that sometimes (due to the code running on Linux probably rather than an RTOS), I got bad data – readings of -1 or 0xFFFF.  I put in some protective code to spot, ignore, and retry reading the sensors when the gyros, temperature and z axis acceleration were all 0xFFFF.  This was on the basis that occasional -1’s for the gyros weren’t going to cause problems.  As a result, during the course of a 10s flight, a few accelerometer data errors got caught and flight quality got a lot better.

With the recent updates, the gyros have taken a much more significant role in angle calculation, and that then affects gravity.   I woke this morning realising I’d better see if I needed to put more rigorous checks in place.  Here’s the screen dump from a flight this morning with those checks in place:

[WARNING] (MainThread) go 1436, Chloe is flying.
[CRITICAL] (MainThread) CheckCLI 1156, Pre-flight checks passed, enjoy your flight, sir!
[WARNING] (MainThread) go 1443, calibrate_gravity = False, fly = True, hover_target = 450, shoot_video = False, vvp_gain = 360.000000, vvi_gain = 180.000000, vvd_gain= 0.000000, hvp_gain = 1.250000, hvi_gain = 0.010000, hvd_gain = 0.000000, prp_gain = 88.000000, pri_gain = 44.000000, prd_gain = 0.000000, rrp_gain = 84.000000, rri_gain = 42.000000, rrd_gain = 0.000000, yrp_gain = 50.000000, yri_gain = 25.000000, yrd_gain = 0.000000, test_case = 0, alpf = 2, glpf = 2, rtf_period = 1.000000, tau = 0.500000, diagnostics = False
[CRITICAL] (MainThread) go 1539, Just warming up and chilling out. Gimme 20s or so...
[CRITICAL] (MainThread) go 1626, 20...
[CRITICAL] (MainThread) go 1626, 19...
[CRITICAL] (MainThread) go 1626, 18...
[CRITICAL] (MainThread) go 1626, 17...
[CRITICAL] (MainThread) go 1626, 16...
[CRITICAL] (MainThread) go 1626, 15...
[CRITICAL] (MainThread) go 1626, 14...
[CRITICAL] (MainThread) go 1626, 13...
[CRITICAL] (MainThread) go 1626, 12...
[CRITICAL] (MainThread) go 1626, 11...
[CRITICAL] (MainThread) go 1626, 10...
[CRITICAL] (MainThread) go 1626, 9...
[CRITICAL] (MainThread) go 1626, 8...
[CRITICAL] (MainThread) go 1626, 7...
[CRITICAL] (MainThread) go 1626, 6...
[CRITICAL] (MainThread) go 1626, 5...
[CRITICAL] (MainThread) go 1626, 4...
[CRITICAL] (MainThread) go 1626, 3...
[CRITICAL] (MainThread) go 1626, 2...
[CRITICAL] (MainThread) go 1626, 1...
[CRITICAL] (MainThread) go 1633, pitch 0.232744, roll 0.372264
[CRITICAL] (MainThread) go 1634, egx -0.000000, egy -0.000000, egz 1.019816
[CRITICAL] (MainThread) go 1636, data errors: 540; i2c errors: 2
[CRITICAL] (MainThread) go 1712, Thunderbirds are go!
[CRITICAL] (MainThread) getTargets 1312, ASCENT
[CRITICAL] (MainThread) getTargets 1312, HOVER
[CRITICAL] (MainThread) getTargets 1312, DESCENT
[CRITICAL] (MainThread) getTargets 1312, STOP
[CRITICAL] (MainThread) CleanShutdown 1208, lps: 960.246782
[CRITICAL] (MainThread) CleanShutdown 1216, data errors: 557; i2c errors: 2

The red pieces show how many times there is a probably duff data read which needs a retry.  The first batch come from the time when I’m warming up the Butterworth filter and the rolling average of the angles.  This code only uses accelerometer values, and the existing code caught and retried these so no problem there even though 50 times more errors had been detected with my updated error catcher code.

However the second batch comes at the end of the flight and is the total; there had been an additional 17 data read errors!  Most of these would have had only a tiny effect, but we’re now in a zone where we’re looking for net acceleration error < 1cms-2 or 0.001g.  Tiny errors matter a lot!

Now I can get back outside to do that tuning.

 

Fee FiFo Fum, Feck

After a lot of timing experimentation today*, I’m pretty certain my I2C errors (and duff data) are due to the MPU-9250 sensor registers being updated while I’m still reading them.  If so, I need to move to using the FIFO provided.

I opted to read the data registers directly originally for simplicity, and then (when I found a single I2C ready could return multiple bytes of data) for performance.

Swapping to the FIFO is going to be a right pain.  It’s essentially a pipeline of bytes with data from the sensor fed in at the bottom, and read at the top.  But the bytes can only be read byte-by-byte (still over an I2C interface), so care needs to be taken to build a full set of bytes before processing that next batch from the sensors.  It does have a latch to hold-back updates while reads are happening which is good.

However, I think it also makes the hardware interrupt meaningless – the count of data in the FIFO is just polled over I2C and if greater than a complete batch, read the next full batch of data.  That’s messy as it’s a CPU hog, and is going to rule out having ‘spare CPU’.

Or putting it another way, it’d be better left to an Arduino (penny in the swear box) to hard-loop reading the FIFO as new data is queued up.


*I tried

  • different model A+’s
  • different MPU-9250 (Chloe’s and Phoebe’s)
  • a kernel update (sudo rpi-update)
  • various I2C baudrates (100, 200 and 400kHz)
  • various overclocking from 750MHz to 1GHz
  • setting combined=1 in /etc/modprobe.d/i2c_bcm2708.conf
  • various SMPLRATE_DIV settings for the sensor sampling

Blowing raspberries at the RTOS dogma

After some code tweaks, I’m now getting every single piece of data – yes, that’s right, I’m reading data at 1kHz!

1kHz sampling

1kHz sampling

Primary problem was integrating samples rather than averaging them.  I’d found before that calling time.time() was actually a very time consuming call.  By just adding samples instead of integrating them, I only need to call time.time() in each motion processing loop rather than per sample.  It’s that which has taken my sampling rate to 1kHz – or to put it another way, the python code can now read the sensors faster than the sensors can churn out the data, so my code sits there twiddling its thumbs waiting for the data.

I’ve not tracked down the cause for the occassional I2C miss, so I’ve also moved the try / except wrapper from around the I2C read to around the (interrupt + I2C read).  That forces the code to wait for the next sample before trying to read it again.  That’s the cause for the spike in the middle.

Combining the above with my recent super-duper fast GPIO edge_detect_wait() update listening for a single 50us pulse per call, once more, I can climb back up on my trusty steed and blow raspberries at the RTOS dogma.

Code’s up on GitHub.

Yet more data sampling diagnostics

In this run, I’d swapped to using 50us interrupt pin pulses, driven by the MPU-9250.  I’ve also dropped the data sample rate to 500Hz and the motion processing down to 37Hz, so that would suggest about 13 – 14 samples batching into the motion processing code. Here’s the equivalent graph to yesterday.

500Hz 50us sample trigger

500Hz 50us sample trigger

It didn’t do what I was expecting. Here’s a graph that makes it clearer what’s odd:

500Hz sampling, 37Hz motion

500Hz sampling, 37Hz motion

This is showing batches of 18-19 and 21-23 samples per 37Hz (27ms) processing cycles.  So regardless of the sampling frequency set to 500Hz, I’m still getting data at a 1kHz sample rate.

The periodic spikes of only 12 samples per processing cycle are increasing likely to be due to python underworld garbage collection.  The original single threaded code didn’t generate garbage, but the single-threaded version of the multi-thread code might do – it’s the block of data that’s filled in the sampling code, and read by the motion processing code.  Its lifetime is only milliseconds.  Perhaps I’ll have a closer look about how to skip those data copies in the single threaded version.  Easy in ‘C’ because I can pass pointers around for the sampling thread to copy data direct to the motion processing thread, but I’m not sure how / if that’s possible in python.

Finally, I still get an I2C missing or two in the warm up code, which I still have no idea how this could happen.  Perhaps garbage collection too?

More data sampling diagnostics

I merged the code back to being single threaded to stop the GIL messing up the gap between the data interrupt and the reading of the sensors, and took HoG out for a couple of flights.  I still get I2C errors during warmup, but the flights were good to the extent I could get decent diagnostics – flight plan was 1s to wind up the props, 2 seconds climbing, 5 seconds hover and 2 seconds descent.

This graph shows an approximation of the sensor sample rate achieved by the code over the course of a 4.5s flight before I aborted it.

Each time the sensors are read, a counter is incremented.

Each time the data motion processing is finished (approximately 71Hz or 14.1ms), a timestamp is taken.

What’s shown below is the motion period divided by the count increment during that motion period plotted against elapsed time in other words, a crude approximation for the rate samples are read, where each “sample read” time is extended by a fraction of the motion processing time depending on how many samples were taken between the end of one motion processing session and the start of the next 14.1ms later.

Data sampling

Data sampling

~50% of the data shows sampling around 1.5ms; during this phase, a batch of 9 – 10 samples are passed into the next motion processing run.

~50% of the data shows sampling around 2ms; during this phase a batch of 7 – 8 samples are passed into the next motion processing run.

Assuming motion processing time is constant, then there seem to be periods when every 1kHz sample is caught (the 9 – 10 samples periods), and other periods where a sample or two are missed.

Both sets of data are good, with no suggestions of data being read after it had expired causing sensor errors, though the clustering is odd – I would have hoped for a noisier zig-zaggy line between 1.5 and 2ms.

And then there are the spikes where only one sample gets taken in that 14.1ms period.  There’s a 2s interval between the 2 peaks, and it is inexplicable within my Python code world – is that the frequency of the Python underworld garbage collection?

More digging anon.

Oh silly scope!

Here’s a capture from my ‘scope of the data ready interrupt pin.  I’ve configured it so that it goes high when there is data ready, and drops back low when the data is read by the code:

Data ready interrupt

Data ready interrupt

The display is 20ms wide, so there should be 20 pulses shown if everything was perfect.  The narrow pulses are perfect.  But there’s a couple interesting bits:

  1. There are several missing pulses most obviously between 14 and 18ms – that suggests the sensor is not consistently setting the interrupts pin high at the configured 1kHz sampling rate.
  2. There is one wide pulse between 6 and 9ms – that indicates an extended period between the sensor raising the interrupt pin high and the code reading the data – that could be because Linux / Raspian / Python is not real time, or because the motion processing thread was running, taking longer that 1ms – that prevents the sensors being read due to the Python GIL blocking threads.  That’s to be expected and there’s not a lot I can do without switching to another Python implementation – Cython or PyPy rather than CPython..

There’s nothing I can do about the gaps – that’s a sensor problem.  The code already attempts to compensate though by timing gaps between reads and uses that for integration,  which is then averaged on passing to the motion processing code periodically.  Still, it’s a worry gaps this large exist.

It’s reassuring that all the pulses shows are at 1ms intervals, so there’s no bogus spikes causing the I2C errors during flight.  The flight logs show only 1 I2C error which happened during warm up; to some extent that has lesser impact as the period is used to load up the Butterworth filters and at the end, extract take-off platform slope and resultant gravity distribution and calibration across the quad axes.

After all the tinkering I’ve been up to, I think it’s worth a flight just to see if anything is better as a result.