Garmin LiDAR-Lite interrupts

Here’s what I’ve captured from my scope.

LiDAR interrupts

LiDAR interrupts

It’s mostly at 2.5V with 3.3V peaks at 50ms or 20Hz corresponding to the LiDAR sampling frequency.

So the interrupt is kind of working, except the GPIO pull down is not ‘hard enough to get the 2.5V down to 0V.  Because of that, the GPIO rising edge code isn’t capturing these tiny interrupt peaks.

This suggests the LiDAR output isn’t floating as the spec implies, but has a pull-up which is lower resistance than the GPIO pull down resistor.  Seems I need to add an explicit pull-down resistor to aid the GPIO pull-down internal resistor.  Time to find out what that is.

P.S. The internal GPIO pull up/down resistor is apparently 50k, and a quick calculation suggests that means the Garmin pull-up is about 17k – this results in the base-level 2.5V shown above. So I added a 10k between the interrupt and ground, as this would bring the ‘floating’ voltage to less than 50% and hopefully enough to allow the GPIO pull up/down to work. But it didn’t, it dragged everything down to near zero (including the spikes), so I removed it, but the spikes shown above had vanished, suggesting the 10k may have broken something. The Garmin is still producing height values luckily. Annoyingly, it had no effect on the I2C errors. I’m running out of ideas of what’s causing these. 🙁

HoG’s apple?

I may have spotted one last low-hanging fruit in the orchard, and it may be gold but there’s a chance it’s just brass.

Currently motion and sampling processing run in series on a single thread.  Critically, the sampling code waits for the next data ready interrupt.  That’s wasted time that the motion processor could use gainfully for a bit more processing, but it can’t because the code is blocked  waiting for that interrupt.  When 10 sets of samples have been gathered, the sampling code hands back to the motion code; the motion code takes 3ms, and during that time, nobody is waiting for the data ready interrupt and so valid samples are lost.

Running sampling and motion processing on separate Python threads doesn’t work because they are running on a single CPU with the python interpreter scheduling running the show.  Throw in the GIL, and they’re effectively still running serially.

Now many many years ago, I wrote a kernel driver for an SDLC device.  It had an ISR – an interrupt service routine – which got called at higher priority when a new packet arrived; it did a tiny bit of processing and exited, allowing the main thread to process the new data the ISR had caught and cached.

That was in the kernel but I think something very similar is possible using the GPIO library in user-land.  It’s possible to register a callback that is made when the data ready interrupt occurs.  In the meantime, motion processing runs freely.  The interrupt handler / callback run on a separate OS (not python) thread, so hopefully the GIL won’t spoil things.

So here’s how it should work:

  • A data_ready callback gets registered to read the sensors – it’s triggered by the rising edge of the hardware data ready interrupt.
  • Once installed, the callback is made for each new batch of data every 1ms and a batch of sensor data is read.
  • The callback normally just caches the data but every 10 samples, it copies the batch into some memory the motion thread is watching, and kicks it into life by sending a unix signal (SIGUSR1).
  • The motion thread sits waiting for that signal (signal.pause()) – just like for the “threading” code does now.
  • Once received it has 10ms to process the data before the next batch comes through from the callback – that’s plenty of time.

The subtle difference is that the waiting is happening in the kernel scheduler rather than in the python scheduler, meaning the python motion code can run in between each new data ready interrupt callback.

From looking at the GPIO code, the callback thread is not a python thread and should not be affected by the GIL nor the overhead of python threading.  Which means the motion processing can happen at python level while sensor sampling happens on a ‘C’ level thread.

It definitely worth a go.

If it works, it also allows me to climb nearer to moral high ground too: I’ll be able to revert to the standard RPi.GPIO library rather than my version which performance tuned the wait_for_edge() call.  The callback function doesn’t have the same inefficiencies.  One of my guiding principles of this project was to keep it pure, so it feels good to return to the one true path to purity and enlightenment.  Fingers crossed!



Blowing raspberries at the RTOS dogma

After some code tweaks, I’m now getting every single piece of data – yes, that’s right, I’m reading data at 1kHz!

1kHz sampling

1kHz sampling

Primary problem was integrating samples rather than averaging them.  I’d found before that calling time.time() was actually a very time consuming call.  By just adding samples instead of integrating them, I only need to call time.time() in each motion processing loop rather than per sample.  It’s that which has taken my sampling rate to 1kHz – or to put it another way, the python code can now read the sensors faster than the sensors can churn out the data, so my code sits there twiddling its thumbs waiting for the data.

I’ve not tracked down the cause for the occassional I2C miss, so I’ve also moved the try / except wrapper from around the I2C read to around the (interrupt + I2C read).  That forces the code to wait for the next sample before trying to read it again.  That’s the cause for the spike in the middle.

Combining the above with my recent super-duper fast GPIO edge_detect_wait() update listening for a single 50us pulse per call, once more, I can climb back up on my trusty steed and blow raspberries at the RTOS dogma.

Code’s up on GitHub.

The need for speed!

My code has been running at about 450 loops per second, and it seemed that whatever I tried to speed it up was having little effect.  The data from the MPU6050 was being updated 1000 times per seconds, so surely I could do better than 450?

Eventually, I started to suspect my customized GPIO python library was the cause – it waits for a hardware interrupt signalling fresh data is available – it calls epoll_wait() twice – could this explain why?  Is it only catching every other interrupt and hence reducing the speed to a maximum of 500 loops per second? It seemed plausible so I changed the code, and sure enough, processing speed has gone up to 760 loops per second.  The missing 240 loops are due to python motion processing, so now I can fine tune these and expect to get even better results.

Why does this matter?  By nearly doubling the amount of data received in a fixed period, I can get better averaging over that period, which means I can increase dlpf to a higher frequency, and so reduce the risk of filtering out good data.

I’ve updated the code on GitHub – you’ll need to remove the current GPIO code first before unpacking the GPIO.tgz and running the install thus.

sudo apt-get remove python-rpi.gpio
tar xvf GPIO.tgz
sudo python install

Next step was to see what refinements I can make to the python code to speed the sensor data processing further. I moved the calibration and units checking from the main loop to the motion processing, and that upped the speed to 812 loops per seconds.

Now all I need to do is test it live!

GitHub update

I’ve just pushed a couple of things up to the GitHub repository:

  • the LibreOffice presentation I gave at yesterday’s CamJam
  • the RPi.GPIO python library with enhanced hardware interrupt performance.

To install the RPi.GPIO changes type

cd ~
tar xvf GPIO.tgz
sudo python INSTALL

Then to use the improved performance, instead of calling GPIO.wait_for_edge(), call

  • GPIO.edge_detect_init(pin, edge) once at startup
  • GPIO.edge_detect_wait(pin) whenever you want to wait for a GPIO pin event
  • GPIO.edge_detect_term(pin)

GPIO.wait_for_edge() still exists as a wrapper for these 3 functions.  pin and edge are identical parameters as for the standard GPIO.wait_for_edge() call.

Propeller size and GPIO.wait_for_edge()

So I tried both of these today.

Dropping back to 10 x 3.3 T-motor propellers had no obvious effect so I’ll go back to the 11 x 3.7’s I’ve been using recently.

GPIO.wait_for_edge() is looking promising. In the official code, the Python call combines initialization, waiting, and cleanup into a single call to catch the hardware data-ready interrupt on the GPIO pin. This was 50% slower than my Python hard-loop reading the data ready register instead.

But by splitting this into three, edge_detect_init(), edge_detect_wait(), and edge_detect_term() with a wrapper to maintain back-compatible support for wait_for_edge(), my code now only calls edge_detect_wait() in the main processing loop – the other two are only called at start-up / shutdown. This means various bits of epoll() fd setup now only happens once rather than each processing loop.

The overall processing speed has only increased by perhaps a percent or two (expected), but critically, this represents getting marginally prompter indications of a new batch of sensor data availability. And hopefully therefore, reduce the risk significantly of dodgy sensor reads which mess up the velocity integration.

In my rewrite of the GPIO event handling code, I have broken some of the other event detection function I don’t use, but once I’ve reinstated that, I’ll see if Croston will accept these changes into the mainline GPIO Python library.

No Hesitation, Repetition or Deviation

Although neither the drone nor alarm pi projects are even phase 1 complete (phase 1 for the drone is safe automaton takeoff, hover and land, and for the alarm is independent alarm control), I’m strongly considering moving from the RPi.GPIO library to the RPIO library for a number of reasons:

  • RPIO interrupts can be used to wake calls, whereas RPi.GPIO runs a separate thread which can only wake a on the make thread by sending a SIGINT – functional but ugly
  • RPIO supports hardware PWM across any GPIO port whereas currently, I use I2C to connect to a PWN breakout chipset

While using the RPi.GPIO works fine, RPIO just feels better. So once phase 1 of both projects are complete, I’m going to add phase 1.5 which is the switchover to RPIO (or perhaps merge the best of both).

In passing, I’m also considering moving over to PiPi once it’s available in the Wheezy distro – the drone is running full speed currently with no sleeps. Moving to PiPi means I can start introducing time.sleep() as the precursor to TCP socket inputs via Currently the space in the CPU cycles from using interpreted Python is feeling a bit small to stably introduce remote control.

Offloading the burdened drone

My drone code is a hard loop (no sleeps or other time blocking commands). Due to being written in interpreted Python, currently each loop around the code, checking the sensors, running the PIDs and updating the PWM takes about 0.015s. That seems pretty fast but I’m starting to wonder if it’s fast enough – the integrations, particularly of the accelerometer, are drifting. So what to do?

  • One step is to not run any irrelevant daemons – trouble is I have no idea how to find them
  • Another is to move from interpreted Python to compiled Python – for this I have to wait unti PyPy comes out of alpha release
  • Yet another would be to bring the PWM driving the motor ESCs away from the PWM I2C breakout board, and instead use RPIOs RPi DMA (direct memory access) PWM – but again, that’s in beta.

So for the moment, I think I’ll stick with what I’m doing, and assume there’s a bug in my integrals or PIDs – fingers crossed (again!).

Reinventing the wheel…

The first draft code for the AlarmPi was flawed due to restrictions in the GPIO library as previously explained. The main one was that the switch debounce had to be set to longer than the amount of processing in the switch event callback, but that meant that the switch was disabled for a long period (until the PIR was active in fact) so it was impossible to turn the alarm off until it was fully activated, meaning the alarm went off while it was turned off.

I’ve coded around this now by putting all the processing in the main thread, with an internal FSM (finite state machine). All the callbacks do now is set the next input for the state machine, and wake the main thread with an os.kill(os.getpid(), signal.SIGINT). It’s still not how I’d like it, but it works, and means I can get on with the drone rather than having to rework the RPi.GPIO library.

There’s nothing wrong with this semantically, it’s just the pedantic side of me who’d like either single threaded interrupts, or an ability to wait on the GPIO file descriptors at the same time as sockets fd’s in a single main-thread call. I have a enough pieces of jigsaw to test whether that’s possible, but this has now been deferred because this new AlarmPi code avoids the problem altogether.

What’s troubling me with the RPi.GPIO python library?

AlarmPi needs to monitor 2 inputs currently: the switch and the PIR.  Once the AlarmHub is finished, it needs to track a TCP socket as well so that when one Alarm goes off, the hub can tell all the others to go off too.

There are several ways for the RPi.GPIO library to monitor and report inputs:

  • RPi.GPIO.input(channel) but you’d need to poll frequently and would probably miss a change
  • RPi.GPIO.wait_for_edge() blocks, so you wouldn’t miss the event, but it can only track one channel; the Alarm + Hub needs at least 3
  • RPi.GPIO.edge_detected() doesn’t block, which means that although it still only covers 1 channel again, each could be checked in turn; in addition, the input is latched until read meaning it can’t be missed.  The downside is that you’d need to keep checking say once a second to detect switch and PIR changes
  • RPi.GPIO.add_event_detect() allows a callback to be made when an edge is detected; unfortunately this happens on a separate thread, and does not wake a sleeping main thread.  The only way to work around this is for the callback thread to send os.signal(signal.SIGINT) to wake the sleeping main thread via a signal handler, but that then makes it harder to use ctrl-C to stop the code.

AlarmPi currently uses this last option as the only one that can be made to work efficiently, but the code shows the messy interactions between callbacks, the main thread. and the signal handler. It’s also forced to have a super extended debounce selected (30s) on the switch callback; once the switch is turned on, it needs to beep / light the LED for 30s to allow the user to leave the room before the PIR is enabled. Because the switch callback doesn’t wake the main thread, this 30s processing takes place in the callback itself. To allow this to work, the callback bounce delay must be longer than 30s. If it isn’t, then when the alarm is turned on, any bounce in the switch is queued until the 30s callback has finished, and then it is processed, immediately toggling the switch off again disabling the PIR as though the user had turned it off. With this hacky debounce delay of 30s, this actually means once the hub exists that if you accidentally turn on the alarm, you can’t turn it off until the PIR is active, at which point attempting to turn off the alarm will trigger the PIR, and most likely deploy the CS gas, ring the police etc. Yet without the hacky fix, any switch bounce (likely) will automatically turn the alarm off immediately every time you try to turn it on; catch 22.

When the Hub comes along, the situation gets worse as the main thread would need to sleep until an intruder is detected, and then use a call for receiving data, meaning yet more messy interactions with callbacks and signal handlers.

So I’m looking at modifying the RPi.GPIO library.  Here’s my current plan:

  • Make the RPi.GPIO library more object oriented:
  • GPIO.setup() returns a gpio object representing a single channel / GPIO pin.  Errors are reported via Try: Except handlers
  • Currently, in the C library, each GPIO pin is accessed via a file descriptor (fd) passed to epoll() for input and write() for output – these would now be stored inside the GPIO class object, one per channel
  • A new python class function gpio.socket() returns the fd in a socket object.
  • In turn, this fd can be used by (I hope!) just as other TCP sockets are used; the advantage here is that select can watch many fds at a time, sleep when nothing is happening, and wake when one or more sockets have something to report.
  • The current blocking functions RPi.GPIO.input and output would still be supported.
  • The current callbacks would become unnecessary, as would the wait_for_edge, edge_detected, and add_event_detect – the sockets solution provides a solution covering these and more, although support for them should be retained if at all possible for back-compatibility reasons.
  • In passing, I’ll also fix another restriction I hit in the Turtle project, where a GPIO output always is set up with default value 0; instead RPi.GPIO.setup() will carry an extra parameter, defaulted to 0, but allowing the user to specify 1 if needed.

The only problem is I have no idea how Python libraries really work.  I have the ‘C’ code for the RPi.GPIO library, and 22 years experience of writing C code.  I’m just not 100% confident that the GPIO fds can be used by (although I think it should work since select() uses epoll() under the surface) nor am I experienced in writing the C code to support the new python class required.

Sounds like an interesting challenge. Not sure whether I’m up to it, but I’ll give it a try in the background.