2026-06-09 - research - GPSD part 2

Fuzzing GPSD,
Part 2: Lessons Learned

Same fuzzer, better questions. The second writeup is less about code and more about what the campaigns taught me.

This is a follow-up to my first post on fuzzing GPSD. If you have not read Part 1, I would start there, since this builds directly on that work. After spending more time with this fuzzer and actually sitting down to analyze what was working and what was not, I learned a lot about structure-aware fuzzing that I did not fully appreciate when I started. This post is less about the code and more about the methodology, the mistakes, and what I would do differently knowing what I know now.

Revisiting the attack surface

After Part 1 I went back and actually mapped out GPSD's architecture instead of just diving at whatever function looked interesting. Turns out there is a whole layer of code sitting above packet_parse that handles socket I/O, client management, device multiplexing, and protocol autodiscovery. The network-facing entry points in gpsd.c and the device handling in libgpsd_core.c are where real attacker data first touches the daemon. By fuzzing packet_parse directly I was skipping all the interesting state that accumulates before the lexer even sees a byte.

That said, packet_parse is not a bad target, it is just not the only target. The function handles 23 protocol families and has genuinely complex state-machine logic. But I was essentially giving it perfectly framed input buffers when in reality a remote attacker sends fragmented TCP streams, interleaved protocols, malformed HTTP chunks, and all kinds of garbage that gets filtered or transformed before reaching the packet parser. My harness was too clean.

Lessons learned from Part 1

Fuzzing downstream was a mistake (kinda)

I mentioned this briefly in Part 1 but want to expand on it, because it is probably the biggest takeaway. When you fuzz a function downstream of the actual entry point, you implicitly assume all the preconditions the upstream code establishes. In my case, packet_parse expects:

inbufptr to point somewhere valid within inbuffer.
inbuflen to reflect actual data length.
Various state-machine fields to be in consistent states.
No concurrent access (GPSD is single-threaded, but still).

My generator handled these explicitly, but I was essentially doing the upstream code's job manually. That means any bugs in the upstream code — the socket handling, the buffer management, the initial protocol detection — none of that was being tested. I was fuzzing the parser in isolation when the interesting bugs often live in the glue code.

The flip side is that fuzzing downstream can be valid when the downstream function is complex enough to warrant dedicated attention. packet_parse definitely qualifies. But you should be aware of what you are trading off. I was getting coverage in deep parser states I would never reach fuzzing from the socket layer (too many preconditions to satisfy randomly), but I was missing entire classes of bugs.

Mutation strategy rotation matters

One thing that surprised me was how much performance I was leaving on the table with naive mutation scheduling. Originally I picked a random mutation strategy on every single call to mutate(). That sounds reasonable until you realize switching strategies has overhead and some mutations are more effective at different stages of the campaign.

What worked better was rotating strategies on a fixed interval (I settled on every 50k executions). This lets the fuzzer "commit" to a strategy long enough to actually explore its effects before switching. Early in the campaign, byte-level mutations (flips, arithmetic) are good for finding low-hanging fruit. As coverage plateaus, structural mutations (block swaps, splices) help escape local optima. Expensive operations like checksum recalculation should be gated behind low-probability thresholds because they are only useful when other mutations have already broken the checksum.

The other thing is that not all mutations are equal for all protocols. NMEA is ASCII text with a simple XOR checksum, so byte flips in the payload are cheap and often produce valid-ish sentences. UBX is binary with a two-byte Fletcher checksum and length fields, so random byte flips almost always produce immediate rejects. I started biasing mutation selection based on the packet_type hint from the generator: more structural mutations for binary protocols, more byte-level for text protocols.

Structure preservation vs structure validity

This one took a while to internalize. When I started, I was obsessed with generating "valid" inputs — packets that would pass all the length checks and CRC validation and actually be recognized by the parser. I thought that was the point of structure-aware fuzzing.

It is not. The point is to generate inputs that are valid enough to reach interesting code paths but invalid enough to trigger bugs. A perfectly valid NMEA sentence executes the happy path and returns cleanly. A sentence with a correct leader and checksum but malformed field contents reaches the field-parsing logic where the actual bugs live. A sentence with a length field claiming 10KB but only 50 bytes of data tests boundary conditions.

The practical implication is that your mutations should preserve some structure while breaking other structure. Keep the protocol leader intact so the state machine recognizes the packet type. Keep the length field plausible so you do not get rejected at the framing check. But corrupt the payload, the checksums, the field delimiters. This is where I started seeing actually interesting coverage gains instead of grinding on the same edges.

Corpus quality beats corpus quantity

I started with a huge corpus pulled from real GPSD captures, protocol specs, and edge-case samples. Thousands of entries across all 23 protocol families. Sounds good, right? More seeds means more coverage — at least that was my thinking.

In practice, a bloated corpus hurts more than it helps. The scheduler spends time mutating samples that are redundant with each other. Similar inputs produce similar mutations which produce similar coverage. I was burning cycles on NMEA sentences that differed by one field value when what I actually needed was structural diversity.

What worked better was aggressive corpus minimization. After an initial exploration phase, I pruned the corpus down to samples that each contributed unique coverage edges. Fewer samples, but each one actually mattered. The fuzzer's mutation budget got spent on inputs that could reach new code instead of grinding on variations of the same thing.

Our refined approach

Targeting entry points properly

For Part 2 of this project I am working on harnesses that target higher-level entry points. The main candidates are:

handle_gpsd_request() in gpsd.c — handles client commands over the control socket, parses JSON requests, manages watch state. This is where a malicious client would send commands.
gpsd_multipoll() — the main event loop that handles device I/O. Fuzzing here means testing how GPSD handles malformed data from GPS receivers, which is interesting for supply-chain scenarios where the receiver itself is compromised.
netgnss_parse() — handles NTRIP/RTCM data from network sources. NTRIP is basically HTTP wrapped around RTCM streams, so there is protocol-confusion potential.

The challenge with these targets is state setup. Unlike packet_parse, which just needs a lexer struct, these functions expect initialized device contexts, socket descriptors, configuration state, and in some cases actual file descriptors. Building harnesses means mocking or stubbing a lot of infrastructure.

Multi-layer fuzzing strategy

What I am converging on is a multi-layer approach: different harnesses for different depths in the call graph, with corpus sharing between them.

Layer 1 (entry point): fuzz socket handlers with raw byte streams. Low structure awareness, high realism. Finds bugs in framing, buffering, protocol detection.
Layer 2 (parser): fuzz packet_parse with the existing structure-aware approach. Finds bugs in protocol-specific parsing logic.
Layer 3 (backends): fuzz individual protocol backends (NMEA, UBX, RTCM3 drivers) with fully valid packets. Finds bugs in semantic processing, coordinate math, time handling.

Coverage from deeper layers informs corpus generation for shallower layers. If Layer 2 finds an interesting RTCM3 packet, that packet (or a byte-stream representation of it) gets added to Layer 1's corpus. This way the entry-point fuzzer can learn to produce inputs that reach deep parser states without needing to discover the path itself.

Smarter mutation scheduling

I mentioned the 50k rotation interval earlier, but there is more to it. What I am experimenting with now is adaptive scheduling based on coverage feedback:

If coverage is increasing, stick with the current strategy (it is working).
If coverage has plateaued for N executions, force a strategy switch.
If a specific mutation type consistently produces corpus additions, bias toward it.
If a mutation type has not produced a corpus addition in M executions, deprioritize it.

This is not fully implemented yet, but the basic idea is that mutation effectiveness changes over the campaign lifetime and the fuzzer should adapt. Early on, everything finds new coverage. Later, only specific mutations can escape local optima. A static schedule ignores this.

Performance observations

Some numbers from extended campaigns that might be useful:

Exec/sec baseline: ~8k-12k on my desktop (Ryzen 9, 32GB RAM) with ASan enabled. Without ASan it jumps to ~25k, but you miss subtle memory bugs.
Coverage saturation: the initial corpus covers maybe 15% of packet_parse edges. After 24 hours I typically hit 45-55%. After a week, 60-65%. Diminishing returns kick in hard after the first day.
Mutation effectiveness: byte-level mutations account for ~70% of corpus additions in the first hour, dropping to ~30% by hour 24. Structural mutations start slow but become dominant as coverage plateaus.
Checksum repair value: gating checksum repair at 5% probability was too high — I was wasting cycles on packets that would pass validation but were not exploring new edges. 1% or lower seems right for protocols with mandatory checksums.
Memory overhead: the gps_lexer_t struct is ~37KB and we clone it on every execution. This dominates memory bandwidth. I experimented with delta-encoding mutations (only store the diff from a base input) but the complexity was not worth it for this size.

Crash triage improvements

My crash triage pipeline in Part 1 was basically "look at the crash file and figure it out manually." That does not scale. For Part 2 I built out some automation:

Automatic minimization: run crashes through LibAFL's minimizer to get the smallest reproducer. A 10KB crash input is annoying to analyze; a 50-byte one is tractable.
Stack-hash deduplication: compute a hash of the crash stack trace and only keep unique crashes. GPSD tends to crash in the same handful of places with different inputs, so this cuts the noise significantly.
Severity classification: parse ASan output to classify crashes. The generic advice here is to rank by use-after-free and heap overflows, but that lens does not fit this target — the lexer parses into a fixed gps_lexer_t struct with an inline buffer, so nothing on this path touches the heap. ASan reports these as stack- or global-buffer-overflows, never heap-buffer-overflow or use-after-free. The axis that actually matters is read-vs-write (a write is a corruption primitive; a read is at most a disclosure) and how much of the offset is attacker-controlled. I prioritize out-of-bounds writes with a controllable offset over everything else.
Automatic reproduction: a script that takes a crash file, rebuilds the GpsLexerT, attaches GDB, and breaks at the crash site. Saves the manual setup every time.

Still missing proper root-cause-analysis automation, but that is a harder problem. For now the goal is just to not drown in duplicate crashes.

What I would do differently

Map the call graph first. Before writing any fuzzing code, trace all paths from network entry points to the function you want to fuzz. Understand what state is established upstream and what assumptions your target makes.
Start with entry points, not internals. It is tempting to fuzz the "interesting" parser logic directly, but you miss entire bug classes. Fuzz from the outside in, add targeted internal harnesses later.
Minimize your corpus aggressively. More seeds is not better. Unique coverage contribution per seed is what matters.
Instrument your fuzzer, not just your target. I spent way too long not knowing which mutations were actually producing results. Add counters, track corpus-addition sources, log coverage deltas. The fuzzer itself is a system that needs observability.
Do not over-engineer the generator. My initial generator tried to produce perfectly valid packets. It should have been producing mostly valid packets with intentional corruption. The mutations take care of exploration; the generator just needs to get you in the door.
Budget time for triage infrastructure. Finding crashes is maybe 20% of the work. Analyzing, minimizing, deduplicating, reproducing, and reporting them is the other 80%. Plan for this upfront.

Conclusion

Part 1 was about building the fuzzer. Part 2 is about learning from actually running it. The code itself has not changed dramatically, but my understanding of how to use it effectively has. Structure-aware fuzzing is as much about methodology as implementation — you can have a technically correct generator and mutator and still waste cycles on inputs that never reach interesting code.

The biggest lesson is that fuzzing is iterative. You make assumptions, run campaigns, analyze results, and refine. My initial assumption that packet_parse was the right target was not wrong exactly, but it was incomplete. The refined approach targets multiple layers and shares intelligence between them. That is more complex to set up, but it catches bugs that single-target fuzzing misses.

I am still finding bugs in GPSD, though, as I mentioned in Part 1, the maintainers are extremely responsive and often patch things before I finish a writeup. That is actually a good sign — it means the project takes security seriously and fuzzing is just one layer of defense-in-depth. The bugs this work shook loose are the subject of Part 3.

Source

The fuzzer and triage tooling are open source: github.com/xchglabs/gpsd-driver-fuzzer.

Continue to Part 3: The Bugs.