Diagnosing I2C problems (work in progress)

The symptoms of the I2C problems I’m seeing are python exceptions from the smbus library; when caught, as well as incrementing the ‘misses’ count, I have code that re-reads the sensors.  The trouble is, the I2C exceptions I’m catching are actually symptomatic of a bigger problem: duff data from reads which don’t trigger exceptions.  Just one set of duff data results in long lasting errors in angles, acceleration, gravity measurement etc etc etc.  Put simply, you can forget any chance of stability in a flight.

I’d solved the problem with Phoebe using the hardware interrupt to trigger the I2C read.  And it worked like a dream, and Chloë and Zoë inherited that solution.

But the problem came back with the birth of HoG (using the latest Raspbian distribution) and the swap to the MPU-9250.  So I’ve been trying things today to work out what’s gone wrong.

First step was to solder together a couple of pads on the MPU-9250 breakout to connect the pull-up resistor – this should be unnecessary as the Raspberry Pi I2C bus already has a pull-up and adding another could be detrimental.  Anyway, there was no change in behaviour – still I2C missing.

Next step was to track down whether it’s the sensor, the code, the A+ or the kernel causing the problem.  Zoë still lives in a half-alive state so I could swap SD cards around as see what happened:

  • HoG’s SD card @ kernel 3.18.7+ sees I2C read misses with MPU-9250 (HoG’s hardware)
  • Zoë’s SD card @ kernel 3.12.28+ sees I2C read misses with MPU-9250 (HoG’s hardware)
  • HoG’s @ kernel 3.18.7+ sees zero I2C read misses with MPU-6050 (Zoë’s hardware)
  • Zoë’s SD card @ kernel 3.12.28+ sees zero I2C read misses with MPU-6050 (Zoë’s hardware)

That rules out the kernel version, the A+ hardware, and my software.  It suggests a problem with the MPU-9250 breakout board, the PCB it’s connected to, the wiring or a long term I2C driver problem that only shows it’s ugly face with the MPU-9250.

My best guess is the I2C barometer on the same breakout; it uses the same bus, but as yet, I’ve not configured it in any way.  Perhaps as a result, it’s being noisy as a result, annoying all the other passengers on the bus as a result?


Update: on a whim, I reduced the data rate from 1kHz to 500Hz to allow more time for the sensor data to be read.  What I saw was the data rate go up from just under 700Hz to over 700Hz.  That suggests the data ready interrupt isn’t working; data is only being made ready at 500Hz, so where are those other pulses coming from?

I had a looking at my custom GPIO code, and a slightly contraversial change I made; I backed this out and tried again.  This time, no edge were detected at all!

Something in that area is very dodgy.  More tomorrow, no doubt.

Maiden voyage

HoG lost her flight virginity today, and she lost it with enthusiasm – a little too much to be honest.  Three second flight plan – one second takeoff to 0.5m, one second hover and one second descent.  All maiden flights are a huge gamble: in HoGs case, she had

  • new arms
  • new props
  • new motors
  • new frame
  • new calibration method
  • new butterworth filter parameters.

Given that, I’d say her performance was surprisingly good!

She took off vertically from sloping ground.  That alone is nearly an unqualified success.  For it to have been a complete success though, she would have stopped at 0.5m off the ground and hovered.  Instead, she whizzed up to 3m  and then I hit the kill switch even before she had the change to try to hover.

A few lessons learnt even from such a short flight though:

  • zero g calibration seems to work, but it needs doing for X, Y and Z axis
  • having dlpf set to 180Hz rather than the normal 20Hz probably wasn’t a smart move regardless of how good the Butterworth might be
  • aluminium arms bend and don’t straighten when they hit the ground at over 7.5ms-1!

New arms are on the way and will arrive tomorrow, allowing me to do the zero g calibration of the Z axis also!

But what’s Zero-G calibration, and how do you do it without going into space?

Historically, I’ve been jumping through hoops trying to get sensor calibration stable, controlling the temperature to 40°C while rotating her in the calibration cube to measure ±g in all three axes to get gains and offsets.  Yet despite all that effort, the sensors, and hence Zoë, still drifted, even if only modestly over time, still enough that she couldn’t fly in the back garden for more than a few seconds without hitting a wall.

The move to the MPU-9250 for HoG from Zoë’s MPU-6050 IMU initially seemed a retrograde step – it didn’t seem to be able to measure absolute temperature, only the difference from when the chip was powered up.  And that meant the 40°C calibration could no longer work.  Lots and lots of reading the spec’s yielded nothing initially,

But in passing I’d spotted some new registers for storing accelerometer offsets to allow them to be included in the IMU motion processing.  That suggested there was a way to get valid offsets.  Additionally, again in passing, I’d spotted a couple of Zero-G specifications: critically that the Zero-G level change against temperature was only ±1.5mg / ºC.  That means an offset measured in a Zero-G environment hardly drifts against temperature.   And a Zero-G environment doesn’t mean going up to space – it simply means reading the X and Y axis values when the Z-axis is aligned with gravity.  So with HoG sat on the floor, X and Y offsets are read, and then holding her against a wall gives the Z offset.  So calibration and updating the code takes only 5 minutes and requires no special equipment.

Delight and despair at the same time: delight that I now had a way forwards with the MPU-9250 (and it would work with the MPU-6050 also), but despair at the time and money I’d spent trying to sort out calibration against temperature.

RTFM!

My initial tests of HoG* have revealed an unexpected surprise; the MPU-9250 is not back compatibly with the MPU-6050 / MPU-9150 as far as registers are concerned.  I should have read the specs beforehand!

Don’t get me wrong, it’s working in the briefest of tests, but there has been a radical change in temperature scale (now configurable), dlpf (settable separately for gyro and accelerometer), sampling frequencies (way up to 8kHz!), plus new registers to do really useful things like storing calibration data so the IMU can do the work.  And that’s only from the briefest of scans from the spec.

So BYOAQ-BAT articles have stalled again until I’ve found all these changes and added a new class to the code for the MPU-9250, and probably a generic IMU wrapper the uses the common WHO_AM_I register to determine which MPU class code should be used.

To make matters worse, my kids have used all the printer paper in the house for drawing on.  I need printouts of these data sheets to stand a chance to find the changes.  Off to buy some printer paper tomorrow!

*HoG = Heart of Gold – the superclone of Phoebe, Chloë and Zoë upon which the BYOAQ-BAT articles are based.

Zoë, meet the metaphorical brick wall.

Zoë’s been out several times today; all pretty much identical: she took-off, climbed for a second as programmed, halted at hover but slowly accelerated upwards from the hover to an ever increasing vertical climb; throughout there was nigh on zero horizontal drift.  Then I killed the flight before she went too high.  And that’s despite the complementary filter I’ve added to try to extract slow gravity drift from the peaks of real acceleration.  The complementary filter is just too crude a method – it lags too much for the frequency of filtering I need.

Ideally what I’m looking for is a way to get two accelerometer outputs from the MPU6050: one with a dlpf set to 0.1Hz to get real acceleration filtered out from (gravity + acceleration) and the second one (as now) running 45 or 90Hz to ensure nothing is missed.  Realistically that means I need to develop my own dlpf at 0.1Hz with a high cut-off rate.

I’ve found a couple of links already, here and here for me to start reading.

In the meantime, Zoë’s grounded until I can solve the sensor drift.  And before anyone else comments about barometer / altimeter / ultrasonic range finder, yes, I know those will solve the problems easily but within restrictions I’m not willing to accept for the moment.

As a result, this seems like a good point to update the code on GitHub – the main change is the threading model.

Toasty

After a bit of faff*, the thermostat is running well.  While that doesn’t improve the calibrated gravity beyond the current 0.1%, it does remove the need for interpolated values across temperature, which means that non-linearity of the temperature interpolation is removed. and calibration now takes 5 minutes rather than an hour or two and a beer fridge.

I’ve also added some speculative improvements to the calibration rig.  The current rig is a 10mm x 50cm x 50cm acrylic sheet, with bolt holes at each corner, which when combined with a spirit level can be levelled to 0.5º on a flat, hard surface.

It’s now enhanced with a carbon fibre sheet** (3mm) sitting loosely on top to compensate for any slight bowing in the heavy acrylic base.  On top of that lies a slilicon sheet** (1mm) to buffer out some of the noise.  Finally, I decided that for gravity calibration, I could reduce the dlpf from its flight value of 20Hz to a calibration value of 5Hz which should help filter out the noise from the herd of elephants*** which constantly rampages through my house.

So I took her out to see the net results.  The best description of what she did was a double forward roly poly.  After another run with diagnostics enabled, it seems the gyro Y axis is fried – it’s returning the rotation rate consistently as 205863755304.136**** degrees per second regardless of her real rotation rate when roly polying.

Without a decent replacement for Phoebe’s sensor, I now have no choice but to finally decommission Phoebe and bring Chloe into active service.


*Faff list

  • SMD resistors that aren’t flat so they don’t attach firmly to the top of the MPU6050, so there thermal ‘resistance’ is higher than needed, and any slight nudge and they come off – luckily I found another one of the same size which was flat and stuck well
  • Thermal epoxy glue that didn’t set for up to 48 hours, and even when set, became soft when leads are soldered to the resistor – I found a better one on e-bay used by PC overclockers call Arctic Silver which both sets in 4 hours and is a much better thermal conductor
  • I added some adhesive foam to the MPU6050 underside to provide insulation to reduce the rate heat seeps away through the PCB wires (or more correctly, to increase the thermal capacity so that more heat energy is stored, and so for a fixed rate of thermal conductivity, the temperature changes more slowly)

** Both net zero cost as they were in my stockpile of “potentially useful, but currently redundant” stuff bought to solve earlier problems.

*** My 6 year old son and 3 year old daughter

**** 205863755304.136 is a very weird value to get – it’s a 64 bit precision floating point number.  The gyro output is a signed 2 byte integer: 0 – 65536 represents ±250° per second rotation rate.  Even after conversion to radians per second (n / 65536 * 500 * π / 180 which would produce a floating point number), the maximum value should only be ≈±4.3633.. radians per second.  As a result, I do still have a niggle that there is a bug I’ve recently introduced into the code for just the gyro Y axis, but I’ve scoured the source and can’t see any handling that’s specific to just the gyro Y axis.

It’ll do for now

New 10Ω resistor and thermal bonding cement means the chip acquires a stable 30°C in about 30s, with an ambient of about 18°C in the house.

40ºC took much more – 3 – 4 minutes, to be expected really, given the different between ambient and target is double, and the heatsinking is twice as efficent.

Still 30° will do nicely now until next summer, so the next job is to merge the code and move it and the hardware to Phoebe.

The general plan is that on startup, as soon as the MPU6050 is started, Phoebe loops until the sensor reads 30º using a rolling average to ensure it is really stable.  At that point the code can be set free to run, and the temperature monitored at the same rate the ESCs are upted by PWM – about 40Hz.

Similar processing happens with calibration – sensors only take calibration readings at the point temperature has converged to 30ºC ±0.25%.

The underside of the breakout board is covered with 1mm foam tape so insulating it against heat dissipation, so about all I can to to make it better is to do the top also.  A bit ugly so I’ll only do that if absolutely necessary.

Thermal regulator update

I got my new MOSFET today, and it’s definitely switching well; the rate the heat of the MPU6050 rises is down to a minute but that’s still way too slow.

I also got some thermal tape to fix the 50Ω resistor to the MPU6050, but to be honest, it really isn’t sticky enough.

So now had to put in another order with Farnell for some epoxy thermal glue and an SMD 10Ω resistor.  The resistor will generate 2.5W heating at full power, and hopefully the thermal glue with make sure the resistor is tightly fixed thermally and physically to the MPU6050.  Fingers crossed as this is as good as I can get it.

I won’t be able to test it until Thursday, but fingers crossed for then.

Thermostat testing

I was disappointed by the slow heating provided by the BS170 MOSFET so I did some experimentation earlier:

First I covered the whole of the MPU-6050 with foam tape to insulate it.  That has virtually no effect – it still took minutes for the temperature to rise to 30 degrees and stabilize.

I was going to see if I could add two of the 50Ω resistors in parallel to see if that helped.

But then I remembered a post on the Raspberry Pi forum suggesting my MOSFET Vds may be more than the 240mV I’d calculated and that turned out to be true.  The MOSFET Vds had about 4.6V dropped across it suggesting Rds was > 50Ω.  So no wonder the resistor wasn’t heating up.

So I’ve got a new MOSFET on order, along with new thermal tape for sticking the resistor onto the chip and hopefully we’ll see a vast improvement as a result.

thermostat.py

I’ve got temperature management of the MPU6050 working using my protopi.

Thermostat plot

Thermostat plot

Circuitry is identical to that shown previously:

Mosfet switch

Mosfet switch

Load is a 50Ω SMD resistor.  The MOSFET is a BS170, and the 100k is just one I had knocking around from a project 15 years ago (more in that in a few weeks).

Here’s the physical build with the 50Ω resistor attached to the MPU6050 with thermal tape and the red and black wires connecting it back to MOSFET.

Protopi with thermostat

Protopi with thermostat

Here’s the code.

This has worked pretty well but with a few compromises:

  • I had to use a 50Ω resistor as the next lowest in the series was 10Ω; that limits the current, and hence heating effect of the resistor
  • Probably as a result, the system couldn’t attain the desired temperature of 40°C – lowering the target to 30°C worked as the graph shows.
  • It took five minutes for the temperature to stabilize at 30°C, partly because of the lower heating power of the 50Ω resistor, but also I suspect because the breakout and breadboard were sinking heat away from the MPU-6050.

Still proof of concept was successful.  There were various changes I had to make to the exiting PID code that I’ll have to merge in with qc.py carefully, but hopefully that means I can calibrate her at a constant temperature.

Oh, FFS, gimme a break.

After some minor code changes yesterday, there was a radical change in Phoebe’s behaviour – either she wouldn’t even start up her props, or she’d leap into the air as fast as she could.

After much racking of brains (not much there to rack), and pulling out of hair (none whatsoever of that to pull out), I found the problem and for once it wasn’t my fault!  The accelerometer was putting out 1.999999 as the measure of gravity.  Somehow the scale was wrong, but I’d not been anywhere near that code.

I swapped the sensor for my only spare this morning and all is well.  But that means I now only have one trustworthy sensor, and no way to get another in time for the CamJam.  I hope it holds out until then.