Back to the mundane

After a great @CotswoldJam 6th #PiParty yesterday, it’s back to bug fixing; trouble is I only have symptoms and no idea of the cure.

Here’s the various processes running, all connected by shared-memory FIFO streams:

+—————+     (1)
|Sweep|——————>——————+
+—————+             |
              +—————+———+     (3)     +——————+
              |Autopilot|——————>——————|Motion|
              +—————+———+             +———+——+
+———+               |                     |
|GPS|———————>———————+                     |
+———+      (2)                            |
                                          |
                      +—————+     (4)     |
                      |Video|——————>——————+
                      +—————+

The problem is this: when using the GPS and Sweep processes individually, the system works beautifully, but together, the Motion process poll.poll() doesn’t pick up what the Autopilot process sends it.  From the logs for each process, GPS, Sweep, Video and Autopilot are working, sending their data to their neighbour; it’s just Motion that picks up just Video not Autopilot data from the FIFOs.

I can’t help but suspect that is hardware in some way:  The Raspberry Pi 3B has 4 CPUs; I have 5 processes and 4 FIFOs between them.

While I could continue working on the pond or object avoidance enhancements without GPS and Sweep working together, I’d very much like not to.  There’s will now be a short delay until I work out what to do next.

P.S. Ideas gratfully received!


Bug fixed, code updated on GitHub

Curiouser and curiouser

After more chipping away, I’ve got it down to this:

  • If I define _GLL_ACQ_COMMAND in class scope, the I2C command had no effect.
  • If I define _GLL_ACQ_COMMAND in instance scope, the I2C command is effective.

I use per-class scope for all my other I2C sensor classes, I wonder why it’s different here?  The code below does not work as is.  Remove the # from line 38 and all works perfectly.  Utterly bizarre.

from __future__ import division
import smbus
import time

class Lidar_Lite:

    __GLL_ACQ_COMMAND       = 0x00
    __GLL_STATUS            = 0x01
    __GLL_SIG_COUNT_VAL     = 0x02
    __GLL_ACQ_CONFIG_REG    = 0x04
    __GLL_VELOCITY          = 0x09
    __GLL_PEAK_CORR         = 0x0C
    __GLL_NOISE_PEAK        = 0x0D
    __GLL_SIGNAL_STRENGTH   = 0x0E
    __GLL_FULL_DELAY_HIGH   = 0x8F
    __GLL_FULL_DELAY_LOW    = 0x10
    __GLL_OUTER_LOOP_COUNT  = 0x11
    __GLL_REF_COUNT_VAL     = 0x12
    __GLL_LAST_DELAY_HIGH   = 0x14
    __GLL_LAST_DELAY_LOW    = 0x15
    __GLL_UNIT_ID_HIGH      = 0x16
    __GLL_UNIT_ID_LOW       = 0x17
    __GLL_I2C_ID_HIGHT      = 0x18
    __GLL_I2C_ID_LOW        = 0x19
    __GLL_I2C_SEC_ADDR      = 0x1A
    __GLL_THRESHOLD_BYPASS  = 0x1C
    __GLL_I2C_CONFIG        = 0x1E
    __GLL_ACQ_COMMAND       = 0x40
    __GLL_MEASURE_DELAY     = 0x45
    __GLL_PEAK_BCK          = 0x4C
    __GLL_CORR_DATA         = 0x52
    __GLL_CORR_DATA_SIGN    = 0x53
    __GLL_ACQ_SETTINGS      = 0x5D
    __GLL_POWER_CONTROL     = 0x65

    def __init__(self):
        self.address = 0x62
#        self.__GLL_ACQ_COMMAND = 0x00

        self.bus = smbus.SMBus(1)
        self.bus.write_byte_data(self.address, self.__GLL_OUTER_LOOP_COUNT, 0xFF)
        self.bus.write_byte_data(self.address, self.__GLL_ACQ_COMMAND, 0x04)
     
    def getDistance(self):
        dist1 = self.bus.read_byte_data(self.address, self.__GLL_FULL_DELAY_HIGH)
        dist2 = self.bus.read_byte_data(self.address, self.__GLL_FULL_DELAY_LOW)
        return ((dist1 << 8) + dist2) / 100

    def getVelocity(self):
        vel = self.bus.read_byte_data(self.address, self.__GLL_VELOCITY)
        return (self.signedInt(vel)) / 100

    def signedInt(self, value):
        return value if value < 128 else (value - 256)


ll = Lidar_Lite()

start_time = time.time()
ii = 0
while ii < 100:
    ii += 1
    velocity = ll.getVelocity()
    distance = ll.getDistance()
    print "Velocity %f; Distance %f" % (velocity, distance)
    time.sleep(0.1)
print "%d reads took %f seconds" % (ii, time.time() - start_time)

P.S. For the moment, I’m just going to accept I must use the per-instance scope rather than wasting more time working out this python oddity. This version has now incorporated into my Quadcopter code.

Spot the difference.

I’m in the process of migrating the sample code for using the LiDAR-Lite to the format I normally use.  It’s not been as simple as I thought.  I’ve narrowed the problem down to this.  Look for the SPOT THE DIFFERENCE comments.

from __future__ import division
import smbus
import time

class Lidar_Lite:

    __GLL_ACQ_COMMAND       = 0x00
    __GLL_STATUS            = 0x01
    __GLL_SIG_COUNT_VAL     = 0x02
    __GLL_ACQ_CONFIG_REG    = 0x04
    __GLL_VELOCITY          = 0x09
    __GLL_PEAK_CORR         = 0x0C
    __GLL_NOISE_PEAK        = 0x0D
    __GLL_SIGNAL_STRENGTH   = 0x0E
    __GLL_FULL_DELAY_HIGH   = 0x8F
    __GLL_FULL_DELAY_LOW    = 0x10
    __GLL_OUTER_LOOP_COUNT  = 0x11
    __GLL_REF_COUNT_VAL     = 0x12
    __GLL_LAST_DELAY_HIGH   = 0x14
    __GLL_LAST_DELAY_LOW    = 0x15
    __GLL_UNIT_ID_HIGH      = 0x16
    __GLL_UNIT_ID_LOW       = 0x17
    __GLL_I2C_ID_HIGHT      = 0x18
    __GLL_I2C_ID_LOW        = 0x19
    __GLL_I2C_SEC_ADDR      = 0x1A
    __GLL_THRESHOLD_BYPASS  = 0x1C
    __GLL_I2C_CONFIG        = 0x1E
    __GLL_ACQ_COMMAND       = 0x40
    __GLL_MEASURE_DELAY     = 0x45
    __GLL_PEAK_BCK          = 0x4C
    __GLL_CORR_DATA         = 0x52
    __GLL_CORR_DATA_SIGN    = 0x53
    __GLL_ACQ_SETTINGS      = 0x5D
    __GLL_POWER_CONTROL     = 0x65


    def __init__(self):
        self.address = 0x62
        self.distWriteReg = 0x00
        self.distWriteVal = 0x04

        try:
            self.bus = smbus.SMBus(1)
        except:
            return

        self.bus.write_byte_data(self.address, self.__GLL_OUTER_LOOP_COUNT, 0xFF)


        ############################ SPOT THE DIFFERENCE ###########################
#        self.bus.write_byte_data(self.address, self.distWriteReg, self.distWriteVal)
        self.bus.write_byte_data(self.address, self.__GLL_ACQ_COMMAND, 0x04)
        ############################ SPOT THE DIFFERENCE ###########################
     
    def getDistance(self):
        dist1 = self.bus.read_byte_data(self.address, self.__GLL_FULL_DELAY_HIGH)
        dist2 = self.bus.read_byte_data(self.address, self.__GLL_FULL_DELAY_LOW)
        return ((dist1 << 8) + dist2) / 100

    def getVelocity(self):
        vel = self.bus.read_byte_data(self.address, self.__GLL_VELOCITY)
        return (self.signedInt(vel)) / 100

    def signedInt(self, value):
        return value if value < 128 else (value - 256)


ll = Lidar_Lite()

start_time = time.time()
ii = 0
while ii < 100:
    ii += 1
    velocity = ll.getVelocity()
    distance = ll.getDistance()
    print "Velocity %f; Distance %f" % (velocity, distance)
    time.sleep(0.1)
print "%d reads took %f seconds" % (ii, time.time() - start_time)

Using the commented out line, and all works.
Using the uncommented line, and it fails.

The register sets whether data is updated automatically rather than when a command is set to get a new reading.

There’s no python error; neither raise Python syntax errors.

Anyone got any ideas as to what the problem could be?

The logic behind the protective fix

It first glance, the protective fix I put in looks as though it only deals with the exact error I showed in the graph two posts ago just looking for -1 in az, temp, gx, gy, and gz:

if gz == -1 and gy == -1 and gx == -1 and temp == -1 and az == -1:
    self.num_data_errs += 1
    continue
break

But actually, it’s a lots more subtle than that.

The MPU data registers are as follows:

  • registers 59 – 64 represent 2 bytes each of ax, ay, and az
  • registers 65 and 66 represent 2 bytes of temperature
  • registers 67 – 72 represents 2 bytes each of gx, gy and gz.

The duff data showed that ax and ay had been filled in, but the rest had not at the point of reading the data over I2C.  So I speculate that when the registers are being filled with new values, it starts at register 59 and finishes at 72.

The value for az is critical; if the value is -1 instead of somewhere around 8000 (1g), it permanently affects the velocity values controlling the rest of the flight, leading to permanent fixed speed drift.  If az is incorrectly -1, then whether the values of ax and ay or correct or -1 doesn’t matter; the flight is doomed to drift regardless.

In contrast, gyro values are used only short term, and the influence of a duff value gets overridden by the complementary filter in a fraction of a second.  The error lives for just a few milliseconds until it’s lost forever; the flight just has a few millisecond glitch, but the next motion processing loop will correct that glitch.

So actually, the test needs only check if az == -1.  az of -1 maps to freefall with no air resistance; it simply won’t happen unless you fly the quad upside down!  But there’s a very slight chance you might see freefall values (quadcopter in ISS?), but that does mean the test can still be refined.

In ‘C’, an “if” like the above does not necessarily process all of the tests. As soon as one fails the test (for example if gz is not -1), then code code stops testing, and drops out to the “break” statement.

So the tests can be ordered so that the most important fact is checked first: az.  If az is duff, then the values for ax and ay are irrelevant, so there’s no point in checking whether they are -1 whatsoever.  If az is not -1, then even though temperature and gyro data may still be duff, if doesn’t matter, and it’s worth taking the accelerometer data.  Only if az is -1, and the temperature and gyro values are all -1 does it mean the data cannot be reused, and so must be re-acquired.

The difference is subtle, but this is python code running at 1kHz, and every little bit counts (pun intended).

The revised “if” test at the top could actually look like this:

if az == -1 and temp == -1 and gx == -1 and qy == -1 and gz == -1:
    self.num_data_errs += 1
    continue
break

To be honest, this really doesn’t matter, but I did a few test flights today which were a lot better, but there was a blustery wind which meant the flights weren’t visually pretty so no video, so I had time to kill, and my mind wandered.

Compare and Kontrast…

There’s a passing resemblance between the complementary and the Kalman filters, so to start understanding Kalman, I changed variable names to ones I recognised.

Complementry:

Xn = gain * (Xn-1 + (gnow - gprev) * δt) + (1 - gain) * an

Kalman:

Xn+1 =  gain * Xn-1 + (1 - gain) * Xn

The main differences are that the complementary filter is merging two sources of the same information based on a gain (< 1) whereas the Kalman is merging the previous processed and current sensor readings to predict the next value where the gain is dynamic but still less than one – and I didn’t get much further than that, because in writing this down, I spotted a crass (krass?) error in my complementary filter which could more than account for the flight problems.

Kalman will have to wait another day to shine!

I’ll update the code on GitHub as soon as it’s tested – the weather tomorrow looks to be in my favour.

Forward roley-poley

I’ve had at least one report of someone’s quadcopter based on Phoebe’s code just flipping onto it’s head at takeoff.  I couldn’t really help as I’d not seen such symptoms – until the other day that it.

I have no idea why the flip doesn’t happen each and every time on Phoebe, and does for others; my apologies to those whose quad suffered the problem and I was unable to help.

Anyway, the fix is to replace your version of calibrateGyros() with this one:

 def calibrateGyros(self):
   gx_offset = 0.0
   gy_offset = 0.0
   gz_offset = 0.0

   for iteration in range(0, self.__CALIBRATION_ITERATIONS):
     [ax, ay, az, gx, gy, gz] = self.readSensors()

     gx_offset += gx
     gy_offset += gy
     gz_offset += gz

   self.gx_offset = gx_offset / self.__CALIBRATION_ITERATIONS
   self.gy_offset = gy_offset / self.__CALIBRATION_ITERATIONS
   self.gz_offset = gz_offset / self.__CALIBRATION_ITERATIONS

Let me know via the comments if this solves your problem too.

Die another day

After a frenetic frenzy of fixing up Chlöe including adding her thermostat to her Beret, I did a quick passive run just to check all was well.  The thermostat worked like a dream, as did the gyroscope Y axis bug that Phoebe had shown and that I’d assumed was the death of her sensor.

So there’s nothing wrong with Phoebe or Chlöe, but there’s a bug in the code that I’ve recently introduced (probably a typo) that’s playing hide and seek with me better than my kids do.

Idle raises no issues with the code, and no exceptions are raised, which I think leaves me walking the code blindly looking for the needle in a haystack.  I hope the needle is near the bottom of the stack so I stand on it!