2026-06-09 - research - GPSD part 2
Fuzzing GPSD,
Part 2: Lessons Learned
Part one was about building the fuzzer. Part two is about running it long enough to find out which parts of the plan were smart, which parts were cope, and which parts needed a rewrite. The code did not magically become a different project. My understanding of the target did.
The biggest change was admitting that packet_parse() is not the whole attack surface. It is a juicy parser, absolutely, but it is not where attacker data first touches GPSD. There is socket handling, device I/O, driver detection, client state, NTRIP-ish network sources, and a whole pile of glue code that can matter before the lexer sees a byte.
Fuzzing downstream was a mistake, kinda
When you fuzz a downstream function, you inherit all of its preconditions. In this case, packet_parse() expects a sane gps_lexer_t: valid pointers, a believable inbuflen, state-machine fields in range, and buffers that look like something GPSD would actually build.
inbufptrhas to point insideinbuffer.inbuflenhas to describe bytes that really exist.- The parser state cannot be nonsense unless nonsense is the thing being tested.
- The upstream code that fills those fields is not being tested at all.
My generator handled those preconditions, which is why the fuzzer reached deep parser states quickly. But that also means I was manually doing the upstream code's job. Bugs in socket framing, buffer management, initial protocol detection, and client command handling were outside the harness. That is the trade: better parser depth, less entry-point realism.
Mutation scheduling mattered more than expected
My first pass picked a random mutation strategy on every call. That sounds fine until you watch a campaign run and realize the fuzzer never commits to a strategy long enough to learn whether it is useful. I got better results rotating strategies on a fixed interval, roughly every 25k executions.
Early on, byte-level edits are cheap and productive. Later, once coverage flattens, structural mutations and splices matter more. Checksum repair is useful, but only rarely; if you repair everything, you spend too much time making happy-path packets instead of boundary-condition packets.
| Phase | Mutation bias |
|---|---|
| Early campaign | Byte flips, arithmetic, delimiter changes, cheap edits that quickly find shallow edges. |
| Coverage plateau | Block swaps, corpus splices, length/count corruption, protocol-aware structure edits. |
| Checksum-gated paths | Low-probability checksum repair so binary protocols do not reject every interesting mutation. |
Valid enough beats valid
I started out caring too much about "valid" inputs. That is not really the point. A perfectly valid NMEA sentence usually exercises the happy path and leaves. What you want is an input that keeps the parser interested while lying about the parts bugs tend to trust.
Keep the protocol leader. Keep enough framing to reach the right parser. Maybe keep a checksum sometimes. Then break field contents, counts, payload lengths, and delimiters. Structure-aware fuzzing is not about being polite. It is about being rude in a way the target cannot immediately ignore.
Corpus quality beats corpus quantity
I started with a big seed corpus pulled from real captures, protocol docs, and edge cases. It felt responsible. It was also noisy. Thousands of entries do not help if most of them hit the same edges.
The better approach was aggressive minimization. Keep samples that contribute unique coverage. Throw away near-duplicates. A smaller corpus means the scheduler spends mutation budget on inputs that actually have a chance to move the campaign.
The refined plan
The big architectural lesson was that GPSD wants more than one harness. packet_parse() deserves a structure-aware harness because it is dense parser code. But the daemon also needs entry-point fuzzing and driver-level fuzzing if you want the bug classes that live outside the lexer.
| Layer | What it tests |
|---|---|
| Entry points | Socket handlers, control commands, device I/O, framing, buffering, protocol discovery. |
| Parser | packet_parse(), mixed streams, protocol leaders, length fields, checksums, resync. |
| Drivers | NMEA, UBX, RTCM, Skytraq, AllyStar, and the semantic decode code behind classified packets. |
The nice version of this is corpus sharing between layers. If the parser harness finds an interesting RTCM3 packet, feed a byte-stream version back to an entry-point harness. Let the outside-in fuzzer learn from the inside-out fuzzer instead of forcing it to rediscover every gate from scratch.
Performance notes
Some practical numbers from the original campaigns: ASan builds sat around 8k-12k exec/s sustained on my desktop — below the peak figures in part one, because AddressSanitizer and cloning the large gps_lexer_t input both cost real time once a campaign is actually running. Dropping ASan went faster, but I do not care about speed that hides memory bugs. The first day gave the best coverage gains; after that, returns slowed hard.
- Initial seeds covered a small slice of
packet_parse(). - After a day, coverage was much better but already slowing down.
- Byte mutations dominated early corpus additions.
- Structural mutations mattered more after the easy edges were gone.
- The
gps_lexer_tinput is large enough that cloning it becomes real overhead.
Crash triage had to grow up
The first version of triage was basically "open the crash file and stare at it." That does not scale. GPSD can produce a lot of duplicate-looking crashes with different inputs, and a 10KB reproducer is annoying when a 50-byte reproducer would explain the bug faster.
The triage pipeline needed four boring tools: minimization, stack-hash deduplication, ASan classification, and automatic replay under GDB. Finding a crash is the fun part. Reducing it to something reportable is where most of the time goes.
What I would do differently
- Map the call graph before writing harness code.
- Start with entry points, then add internal harnesses for dense parser code.
- Minimize the corpus early and often.
- Instrument the fuzzer itself, not just the target.
- Generate mostly-valid inputs, not perfect inputs.
- Budget triage infrastructure from day one.
Conclusion
The lesson was not "do not fuzz internals." The lesson was to know exactly what you are buying when you do. packet_parse() was a useful target because it gave deep parser coverage and real crash paths. It was incomplete because GPSD is more than one function.
Structure-aware fuzzing is as much methodology as implementation. You can have a technically correct generator and mutator and still waste cycles if the corpus is bloated, the strategy schedule is noisy, or the target is missing half the attack surface. The fuzzer got better when I stopped treating it like a magic loop and started treating it like a system that needed observability.
Source
The fuzzer and triage tooling are open source: github.com/xchglabs/gpsd-driver-fuzzer.
Next
Continue to Part 3: The Bugs.