2026-06-09 - research - GPSD part 1
Fuzzing GPSD,
Part 1: The Lexer Harness
gps_lexer_t, feed packet_parse(), and stop wasting cycles in the reject path.
This started as a learning project. I had not built a real structure-aware fuzzer before, so I picked GPSD and built gpsd_fuzzer: an in-process LibAFL harness that links against GPSD and repeatedly feeds a generated gps_lexer_t into packet_parse().
GPSD is a great target for this because it is small from the outside and messy in the way parsers are messy on the inside. One daemon accepts NMEA, AIS, UBX, RTCM2, RTCM3, GREIS, TSIP, Skytraq, AllyStar, CASIC, and a pile of other receiver formats. I wanted the fuzzer to spend its time inside that parser logic, not throwing random bytes at the door until a packet accidentally looked real.
Why packet_parse()?
packet_parse() is GPSD's packet classifier. It walks bytes through a finite-state machine, looks for protocol leaders, checks lengths and trailers, validates checksums, then exposes a classified packet through the lexer's output buffer and type fields.
It is downstream of the daemon's real network and device entry points, which means fuzzing it is not perfect realism. But it is dense parser code with many protocol families behind one API. For a first campaign, that trade was worth making: fewer sockets and process-control problems, more time mutating actual receiver packets.
void packet_parse(struct gps_lexer_t *lexer)
{
lexer->outbuflen = 0;
while (0 < packet_buffered_input(lexer)) {
/* state machine, protocol detection, length checks, checksums */
}
}
That was only half of this fuzzer's life, though. The direct lexer mode was the fast way to learn the parser. The bug-finding mode was the session harness: raw bytes went through a pipe into gpsd_poll(), then through packet_get1(), packet_parse(), driver dispatch, and finally the driver-specific parse_packet() code. The local notes put the direct packet mode around 20k exec/s and the session mode around 63k exec/s.
I should have mapped the GPSD architecture before writing code. packet_parse() is downstream from the real entry points, so the harness manually creates state that GPSD normally builds while reading from devices and sockets. That is a tradeoff. It misses upstream bugs, but it also gives dense coverage over a complicated parser. Slight cope, but not a bad first target.
Build setup
Because this is an in-process fuzzer, Cargo needs to link against a locally built GPSD. The build.rs script does the boring-but-important work: read GPSD_DIR, run the GPSD SCons build with clang, ASan, UBSan, static libraries, and then copy the resulting archives into Cargo's target directory.
println!("cargo:rerun-if-env-changed=GPSD_DIR");
match env::var("GPSD_DIR") {
Ok(gpsd_dir) => {
// build GPSD static libs, copy .a files, emit cargo link flags
}
Err(_) => {
panic!("GPSD_DIR must point at the gpsd source tree");
}
}
That sounds like plumbing, but it matters. If the target is not built with the sanitizers and coverage mode you think it is, the fuzzer can look healthy while quietly testing the wrong thing.
The input was not bytes
LibAFL is happy fuzzing byte slices. GPSD's parser wanted a gps_lexer_t. That structure contains the byte stream, the cursor into that stream, parser state, counters, timestamps, optional stash buffers, JSON depth bookkeeping, and ISGPS/RTCM2 assembly state.
So the fuzzer mirrored the C layout in Rust with #[repr(C)]. The mutator only touched inbuffer[..inbuflen]. The rest of the structure was kept sane: inbufptr pointed at the start of inbuffer, lengths stayed inside the allocation, and protocol-specific fields started from boring defaults.
struct GpsLexerT {
packet_type: i32,
state: u32,
length: usize,
inbuffer: [u8; MAX_PACKET_LENGTH * 2 + 1],
inbuflen: usize,
inbufptr: *mut u8,
outbuffer: [u8; MAX_PACKET_LENGTH * 2 + 1],
outbuflen: usize,
/* timestamps, counters, stash, ISGPS, JSON state */
}
The generator
The generator picked one of 24 protocol families, selected a sample from a hand-curated corpus, copied it into inbuffer, set inbuflen, and attached a packet-type hint. Sometimes it filled the buffer with multiple concatenated packets to exercise resynchronization and mixed-stream behavior.
The important part was not "valid packets forever." It was "valid enough to get indoors." A GPS parser rejects nonsense very quickly. A packet with a recognizable leader, plausible length, and slightly wrong body gets much deeper.
fn generate_one_packet<S: HasRand>(state: &mut S, choice: u64)
-> (&'static [u8], i32)
{
match choice {
0 => (pick(state, NMEA_SENTENCES).as_bytes(), NMEA_PACKET),
1 => (pick(state, AIVDM_SENTENCES).as_bytes(), AIVDM_PACKET),
/* UBX, RTCM2, RTCM3, GREIS, TSIP, Skytraq, AllyStar, ... */
_ => (pick(state, JSON_SENTENCES).as_bytes(), JSON_PACKET),
}
}
A plain random-bytes generator spends most of its life in GROUND_STATE. The custom generator starts from protocol families GPSD already understands, then lets the mutator make them weird.
The mutator
The custom mutator stayed away from bookkeeping and operated only on the packet bytes. It mixed small byte-level edits, structural edits, corpus splices, length-field corruption, magic-byte injection, and occasional checksum repair. Strategy rotation happened on a fixed interval so one mutation family had enough time to prove whether it was useful before the fuzzer switched gears.
let packet_len = input.inbuflen
.min(input.inbuffer.len())
.min(MAX_PACKET_LENGTH);
if packet_len > 0 {
let bytes = &mut input.inbuffer[..packet_len];
random_mutation(state, bytes, input.packet_type, self.current_mutation);
}
The trick was preserving the parts that route execution while breaking the parts that parsers tend to trust. Keep the protocol leader. Nudge the declared count. Shorten the body. Repair a checksum one time out of a hundred. GPSD then spends less time saying "not a packet" and more time handling edge cases.
The mutation side ended up being where most of the taste lives. NMEA can tolerate cheap ASCII edits. UBX and RTCM need more respect for length fields and checksums. Some mutations are byte-level, some are structural, some splice from the corpus, and some intentionally corrupt counts while preserving enough framing to get past the early gates.
| Mutation | Why it helped |
|---|---|
| Byte flips and arithmetic | Cheap coverage discovery, especially for ASCII protocols like NMEA. |
| Length-field corruption | Targets parser assumptions about declared size versus available bytes. |
| Corpus splicing | Combines real protocol fragments without starting from total garbage. |
| Checksum repair | Low-probability escape hatch for protocols that reject broken frames too early. |
Project shape
The fuzzer ended up split into boring names that did real jobs:
build.rs: build and link GPSD into the Rust target.gpsbindings.rs: C layout bindings for the GPSD structs and packet constants.gps_data_corpora.rs: seed packets across the 24 protocol families.data_generation.rs: construct sanegps_lexer_tinputs from those seeds.checksums.rs: protocol checksum helpers for when repairing is worth it.lexer_mutations.rsand themutations/module: strategy rotation and protocol-aware edits.main.rs: LibAFL wiring, feedback, executor, corpus, crashes, and replay modes.
What this fuzzer was good at
- Driving deep state-machine coverage in
packet_parse(). - Using session mode to reach driver-level decode bugs after packet classification.
- Producing compact crash inputs that could be replayed in-process.
- Teaching us which protocol families needed more targeted harnesses.
For the bugs in the public GPSD work item, the important result from this fuzzer was the Skytraq 0xDD crash path. The direct lexer harness helped shape inputs and coverage, but the session/full-pipeline harness is the one that made driver bugs like Skytraq show up as real corruption instead of just "a packet looked interesting."
It was also a useful humbling device. Fuzzing an internal parser means you are manually creating the preconditions normally built by the daemon. That can be valid, but it misses bugs in socket handling, PTY behavior, WATCH clients, JSON emission, and daemon lifecycle. That lesson becomes the center of part two.
The lesson
Structure-aware fuzzing is not a checkbox. In LibAFL, it is an architecture you build: C layout mirrors, carefully scoped mutation surfaces, corpus selection, cheap mutation hot paths, and enough protocol knowledge to reach code without sanding off every sharp edge.
Part one found useful crash paths by being deliberately structure-aware: fake enough state to run fast, preserve enough protocol shape to reach real parser code, and then let the session harness pull driver bugs into view.
Source
The harness described here is open source: github.com/xchglabs/gpsd-driver-fuzzer.
Next
Continue to Part 2: Lessons Learned.