2026-06-09 - research - GPSD part 1

Fuzzing GPSD,
Part 1: The Lexer Harness

Build a fake-but-consistent gps_lexer_t, feed it to packet_parse(), and stop wasting cycles in the reject path.

It has been a while since I made any blog posts, so I figured I would write up a side project I have been chipping at. It started as a learning project: I had never built a structure-aware fuzzer before, so I built one for GPSD (the GPS service daemon) using LibAFL. It runs in-process and links directly against GPSD; a custom harness repeatedly calls the target function with structured, fuzzer-generated payloads. This post covers the design, the input format, the harness, and the results.

GPSD is a lightweight GPS service daemon used across a wide range of systems to collect, normalize, and share data from GPS or GNSS receivers with client applications through a standardized interface. It is built into many Linux-based devices and distributions, powering everything from car infotainment systems, drones, and marine navigation tools to NTP servers, telecom base stations, and precision timing hardware that rely on GPS for accurate UTC synchronization. It also shows up in research and industrial systems — autonomous robots, weather stations, seismic monitors — where precise location and timing are critical. In defense, satellite, and radio communications gear, gpsd often supports geolocation, tracking, and timing in embedded systems, which makes it a core utility in both civilian and secure infrastructure.

Picking a target function

For a target I wanted something that handled a lot of the packet parsing for all the different packet types. After poking around the gpsd source I found exactly that in packet.c: the packet_parse function.

void packet_parse(struct gps_lexer_t *lexer)
{
    lexer->outbuflen = 0;
    while (0 < packet_buffered_input(lexer)) {
        ...
    }
}

In GPSD's lexer, packet_parse(struct gps_lexer_t *lexer) is the core routine that turns a raw byte stream into validated, typed packets. It incrementally feeds bytes into a large finite-state machine and, when a packet boundary is recognized, verifies it and exposes it to the rest of GPSD through the lexer's output fields.

Quick notes on packet_parse:

Streams bytes through a protocol-detection state machine (nextstate) to find leaders, lengths, and trailers across many protocols (NMEA/AIS, UBX, RTCM2/3, GREIS, TSIP, EverMore, Oncore, Zodiac, CASIC, Allystar, Navcom, etc.).
Logs state transitions and advances counters while scanning mixed-protocol streams.
If valid, copies the packet to lexer->outbuffer, sets lexer->outbuflen and lexer->type, and discards consumed input; if invalid or ambiguous, discards/pushes back bytes and resynchronizes to GROUND_STATE.
Handles optional stash/unstash to cope with interleaved NMEA and similar edge cases when enabled.

When I started this it was something I did here and there in spare time, and I naively dove into fuzzing this function without thinking much about GPSD's architecture. I should have considered that packet_parse sits a bit further down from any entry point, and taken more of the architecture into account. Lesson learned: next time I pick up fuzzing GPSD I would want to go higher up, fuzzing an entry point or something closer to one rather than packet_parse. That said, I do not think it was a huge mistake (slight cope) — the fuzzer still yielded results, and sometimes it is worth fuzzing functions downstream of the entry point if they are interesting enough.

Environment and build setup

I am building for my desktop, which is Linux x86-64, and all the build complexity is handled in build.rs. Since this is an in-process fuzzer we want to link GPSD against the fuzzer. The build script below does exactly that; for more on Rust build scripts used to link in C libraries, see the Cargo docs.

... // imports

// Recursively walk dir for .a files and copy them to out_dir
fn copy_static_libs(dir: &Path, out_dir: &Path) -> std::io::Result<()> {
    ...
}

fn main() -> std::io::Result<()> {
    println!("cargo:rerun-if-env-changed=GPSD_DIR");
    match env::var("GPSD_DIR") {
        Ok(gpsd_dir) => {
            ... // compiler flags, build system stuff, etc
        }

        Err(e) => {
            ... // handle error if cannot find $GPSD_DIR environment variable
        }
    }
}

The build.rs ties Cargo to GPSD's SCons build so the fuzzer links against a locally built, instrumented GPSD. It reruns when GPSD_DIR changes, reads that path, and chooses target/{PROFILE} (debug/release) as the output directory. If no static archives are present there, it runs scons -c and then scons in GPSD_DIR using clang/clang++ with AddressSanitizer/UBSan and GPSD options that produce static libs (shared=false, dbus/shm disabled). It then recursively copies all resulting .a files from the GPSD tree into target/{PROFILE}, adds that directory to rustc's link search path, and links the gpsd and gps_static archives along with system libs (m, pthread, c) and sanitizer runtimes (asan, ubsan). Progress is surfaced via cargo:warning, and the script errors out with a clear message if GPSD_DIR is not set.

That sounds like plumbing, but it matters. If the target is not built with the sanitizers and coverage mode you think it is, the fuzzer can look healthy while quietly testing the wrong thing.

Implementing structure-aware fuzzing in LibAFL

The structure we are fuzzing is the first argument to packet_parse, which takes a pointer to a gps_lexer_t. This structure is responsible for a ton of state-machine tracking as well as a lot of attacker-controlled data that holds GPS packets. We are not fuzzing the entire structure; we generate it and mutate one interesting field, inbuffer, which contains the GPS packet data. When implementing your own structure-aware fuzzing you can mutate many fields, but here I only wanted to fuzz this one.

The `gps_lexer_t` structure

struct gps_lexer_t {
    int type;                            // Classified packet type (e.g., NMEA_PACKET, UBX_PACKET). BAD_PACKET if invalid.
    unsigned int state;                  // Current state of the packet-lexing finite state machine.
    size_t length;                       // Declared payload length if the protocol carries one (UBX/RTCM3/etc.).
    unsigned char inbuffer[MAX_PACKET_LENGTH*2+1];  // Input accumulator for raw bytes (oversized to handle worst cases).
    size_t inbuflen;                     // Number of valid bytes currently in inbuffer.
    unsigned char *inbufptr;             // Read cursor into inbuffer for the state machine.

    // outbuffer needs to be able to hold 4 GPGSV records at once
    unsigned char outbuffer[MAX_PACKET_LENGTH*2+1]; // Output buffer holding the most recently recognized packet.
    size_t outbuflen;                    // Length of valid data in outbuffer (NUL appended for convenience).

    unsigned long char_counter;          // Total characters processed (for logging/metrics/autobaud).
    unsigned long retry_counter;         // Number of sniff/retry attempts made while classifying a stream.
    unsigned counter;                    // Packets seen since last driver switch (helps driver detection heuristics).
    struct gpsd_errout_t errout;         // Error/reporting context (verbosity, label, logger hook).

    timespec_t start_time;               // Timestamp of first input seen on this stream (used by autobaud/timeouts).
    timespec_t pkt_time;                 // Timestamp of the last packet parsed (used for pacing/cycle tracking).
    unsigned long start_char;            // char_counter value at first input (paired with start_time).

    // ISGPS200/RTCM2 decoding context.
    struct {
        bool            locked;          // True when ISGPS synchronization is locked.
        int             curr_offset;     // Bit offset within current 30-bit word during deinterleaving.
        isgps30bits_t   curr_word;       // Currently assembling 30-bit ISGPS word.
        unsigned int    bufindex;        // Next index to write in buf[].

        /*
         * Only these should be referenced from elsewhere, and only when
         * RTCM_MESSAGE has just been returned.
         */
        isgps30bits_t   buf[RTCM2_WORDS_MAX];   // Decoded 30-bit words for one RTCM2 frame.
        size_t          buflen;                 // Packet length in bytes corresponding to buf contents.
    } isgps;

    unsigned int json_depth;             // Nesting depth while scanning JSON streams (for JSON_PACKET).
    unsigned int json_after;             // Bytes scanned after JSON terminator (helps detect packet boundaries).

#ifdef STASH_ENABLE
    unsigned char stashbuffer[MAX_PACKET_LENGTH]; // Temporary stash for bytes when resynchronizing or deferring parse.
    size_t stashbuflen;                            // Valid data length in stashbuffer.
#endif  // STASH_ENABLE

    bool chunked;                        // True if upstream HTTP stream uses chunked transfer (NTRIP/1.1).
    int chunk_remaining;                 // Remaining bytes in the current HTTP chunk before boundary.
};

gps_lexer_t is basically GPSD's per-stream state-machine structure: two big buffers (inbuffer you feed, outbuffer it fills), a moving cursor (inbufptr) the parser advances byte by byte, a state field tracking the finite-state machine as it hunts leaders/lengths/checksums across all the supported protocols, and a pile of bookkeeping (type once a packet is classified, length when a protocol declares one, counters and timestamps for driver detection and pacing). There is a nested ISGPS/RTCM2 context for assembling 30-bit words, optional stash space if built with STASH_ENABLE to park bytes during resync, and some JSON depth/after fields for the control/data JSON mode. Chunked/NTRIP flags keep HTTP transfer framing from confusing packet boundaries. You fill inbuffer + inbuflen (and point inbufptr at the start), then packet_parse() walks inbufptr forward, changes state, validates lengths/CRCs, copies a good frame to outbuffer, sets outbuflen/type, and repeats. Everything else exists so that mixed-protocol streams, timing heuristics, and resync/retry logic do not trip over each other.

Custom generator

In LibAFL, a custom generator is any type that implements the Generator<Input, S> trait. With it you construct well-formed inputs for your target. Instead of emitting random bytes, a generator encodes domain knowledge: it can pick a protocol family, select or synthesize a representative sample, and populate only the fields that constitute valid input for the harness. In practice, generate() uses the state's RNG (HasRand) to vary inputs, enforces size caps for stability, copies bytes into the correct buffer, and sets the minimal metadata the target expects. This matters for structure-aware fuzzing: you get immediate progress into deep parser states, fewer trivial rejects, and better coverage per second.

My generator seeds the initial testcases and generates inputs as the fuzzer runs. It pulls from the corpora in gps_data_corpora.rs and builds a gps_lexer_t structure with the GPS packet data pulled in. Why a custom generator instead of RandBytes? Because it starts from valid-ish protocol frames that obey length fields and (optionally) checksums, so the lexer's finite-state machine immediately progresses into deep, protocol-specific states instead of spinning in GROUND_STATE rejecting garbage. That means fewer trivial rejects and timeouts, and earlier coverage of parsing logic guarded by framing and CRC gates (UBX len/checksum, NMEA talker/checksum, RTCM framing).

fn generate_one_packet<S: HasRand>(state: &mut S, corpus_choice: u64) -> (&'static [u8], i32) {
    /*
    Select one protocol corpus entry and return its bytes plus a gpsd packet_type hint.

    state         - LibAFL RNG state (mut borrowed; we advance its RNG)
    corpus_choice - Which protocol family to sample (0..=22)

    Returns:
    (&'static [u8], i32)
        Slice of packet/frame bytes backed by static corpus data
        Integer gpsd packet_type hint (aligns with gpsd's internal enums)
    */
    match corpus_choice {
        0 => {
            let data = NMEA_SENTENCES[state.rand_mut().next() as usize % NMEA_SENTENCES.len()];
            (data.as_bytes(), 1)
        }
        1 => {
            let data = AIVDM_SENTENCES[state.rand_mut().next() as usize % AIVDM_SENTENCES.len()];
            (data.as_bytes(), 2)
        }
        ...
        _ => {
            let data = JSON_SENTENCES[state.rand_mut().next() as usize % JSON_SENTENCES.len()];
            (data.as_bytes(), 23)
        }
    }
}

fn populate_inbuffer<S: HasRand>(state: &mut S, packet_bytes: &mut [u8; 18433], corpus_choice: u64, fill_inbuffer: bool) -> (usize, i32) {
    /*
    Fill a fixed-size destination buffer with one or (on coinflip) multiple packets.

    state          - RNG for selecting packets
    packet_bytes   - Mutable destination buffer (will be overwritten from start)
    corpus_choice  - Protocol family selector for initial packet
    fill_inbuffer  - If true, we concatenate packets until buffer full; else just one

    (total_len, packet_type_hint_first)
        total_len: number of bytes actually written into packet_bytes
        packet_type_hint_first: type hint from the FIRST packet inserted
    */
    let (first_pkt, first_hint) = generate_one_packet(state, corpus_choice);
    let mut written: usize = 0;

    let first_len = first_pkt.len().min(packet_bytes.len());
    packet_bytes[..first_len].copy_from_slice(&first_pkt[..first_len]);
    written += first_len;

    if fill_inbuffer {
        while written < packet_bytes.len() {
            let choice: u64 = state.rand_mut().next() % 23;
            let (pkt, _hint_ignore) = generate_one_packet(state, choice);
            let remaining: usize = packet_bytes.len() - written;
            if remaining == 0 {
                break;
            }
            let copy_len = pkt.len().min(remaining);
            packet_bytes[written..written + copy_len].copy_from_slice(&pkt[..copy_len]);
            written += copy_len;
            if copy_len < pkt.len() {
                break;
            }
        }
    }

    (written, first_hint)
}

impl<S> Generator<GpsLexerT, S> for GpsLexerGenerator
where S: HasRand,
{
    /*
        Generate one structured lexer input.
        1) Pick a protocol family using RNG.
        2) Choose a concrete sample from that family and attach a packet_type hint.
        3) Copy bytes into the lexer input buffer, capped by config and buffer size.
        4) Initialize the remaining lexer fields to sane defaults.
    */
    fn generate(&mut self, state: &mut S) -> Result<GpsLexerT, libafl::Error> {
        let corpus_choice: u64 = state.rand_mut().next() % 23; // All available GPS protocols
        let fill_inbuffer: bool = (state.rand_mut().next() & 1) == 1;

        let mut inbuffer: [u8; 18433] = [0; MAX_PACKET_LENGTH * 2 + 1];
        let (copy_len, packet_type_hint) =
            populate_inbuffer(state, &mut inbuffer, corpus_choice, fill_inbuffer);

        let chunked = false;
        let chunk_remaining = 0;
        let parser_state = 0;
        let mut lexer = GpsLexerT {
            packet_type: packet_type_hint,
            type_mask: 0, // enable all recognizers
            state: parser_state,
            length: 0,

            inbuffer: inbuffer,
            inbuflen: copy_len,
            inbufptr: inbuffer.as_mut_ptr(),

            outbuffer: [0; MAX_PACKET_LENGTH * 2 + 1],
            outbuflen: 0,

            char_counter: 0,
            retry_counter: 0,
            counter: 0,
            errout: GpsdErroutT {
                debug: 0,
                report: None,
                label: b"Fuzzer\0".as_ptr() as *const i8,
            },
            start_time: TimeSpec {
                tv_sec: 0,
                tv_nsec: 0,
            },
            pkt_time: TimeSpec {
                tv_sec: 0,
                tv_nsec: 0,
            },
            start_char: 0,
            isgps: IsgpsState {
                locked: false,
                curr_offset: 0,
                curr_word: 0,
                bufindex: 0,
                buf: [0; RTCM2_WORDS_MAX],
                buflen: 0,
            },
            stashbuffer: [0; MAX_PACKET_LENGTH],
            stashbuflen: 0,
            json_depth: match packet_type_hint {
                23 => (state.rand_mut().next() % 20) as u32, // Only relevant for JSON packets
                _ => 0, // Not JSON, so depth should be 0
            },
            json_after: match packet_type_hint {
                23 => (state.rand_mut().next() % 10) as u32, // Only relevant for JSON packets
                _ => 0, // Not JSON
            },
            chunked: chunked,
            chunk_remaining: chunk_remaining,
        };

        lexer.inbufptr = lexer.inbuffer.as_mut_ptr();
        Ok(lexer)
     }
}

The generator builds a fresh GpsLexerT (a Rust structure I made to mirror the C gps_lexer_t) by randomly choosing one of the 23 protocol families, selecting a corpus sample for that protocol, capping and copying its bytes into inbuffer, setting inbuflen, and wiring inbufptr to the start so packet_parse() can stream over it. It also drops in sane defaults for the rest of the lexer state (stash/ISGPS/JSON/chunking fields) and pre-fills packet_type with a protocol hint, giving immediate, structure-aware seeds that push the GPSD state machine past trivial leader hunting and into deeper protocol-specific branches from the first executions.

Custom mutator

Custom mutators in LibAFL implement Mutator<Input, S> to apply domain-aware transformations to already-structured seeds, exploring nearby states efficiently instead of destroying validity. A mutator selectively edits targeted regions (here: inbuffer[..inbuflen]) using operations like arithmetic tweaks, byte flips, splices, insert/delete, or protocol-specific checksum repairs. Good custom mutators (1) never touch internal parser bookkeeping (pointers, counters, timestamps), (2) bias toward small localized changes to keep the packet recognizable so deeper states execute, (3) occasionally perform larger block edits, (4) separate "semantic" repairs (recomputing NMEA/UBX CRC) into dedicated passes gated by probability, and (5) keep per-mutation cost low (no heap allocs, minimal bounds checks) to maximize exec/sec. The result is higher-quality mutations: more valid packets reaching length/CRC-guarded branches, and fewer trivial rejects.

My mutator targets only the bytes that matter to the parser, operating on inbuffer[..inbuflen] and leaving all parser bookkeeping (pointers, counters, timestamps, masks) untouched. On each iteration it selects one mutation strategy using the fuzzer RNG and applies a small, localized edit to keep packets recognizable. The protocol knowledge keeps edits "near-valid": it preserves leaders and framing, nudges plausible fields (IDs, lengths, timestamps), and probabilistically repairs checksums. That keeps more packets crossing length/CRC gates, yields deeper coverage with fewer trivial rejects, and produces higher-quality crashes while still sprinkling in the occasional larger block edit.

fn random_mutation<S>(state: &mut S, bytes: &mut [u8], packet_type: i32, mutation: MutationType)
where
    S: HasRand,
{
    match mutation {
        MutationType::Addition => { mutate_addition(state, bytes); }
        MutationType::Subtraction => { mutate_subtraction(state, bytes); }
        MutationType::Division => { mutate_division(state, bytes); }
        ...
    }
}

impl<S> Mutator<GpsLexerT, S> for GpsLexerPacketDataMutator
where S: HasRand,
{
    fn mutate(&mut self, state: &mut S, input: &mut GpsLexerT) -> Result<MutationResult, libafl::Error> {
        if self.mutation_counter % 50000 == 0 {
            let mutation_key = state.rand_mut().below(NonZeroUsize::new(MUTATION_TYPES.len()).unwrap()) as usize;
            self.current_mutation = MUTATION_TYPES[mutation_key];
        }

        let packet_len: usize = input.inbuflen.min(input.inbuffer.len()).min(MAX_PACKET_LENGTH);
        if packet_len > 0 {
            let bytes: &mut [u8] = &mut input.inbuffer[..packet_len];
            random_mutation(state, bytes, input.packet_type, self.current_mutation)
        }

        self.mutation_counter += 1;
        Ok(MutationResult::Mutated)
    }

    fn post_exec(&mut self, _state: &mut S, _new_corpus_id: Option<CorpusId>) -> Result<(), Error> {
        Ok(())
    }
}

The mutator operates on an existing GpsLexerT by targeting only the payload slice inbuffer[..inbuflen]. Each iteration it picks a strategy with the fuzzer RNG, applies small locality-preserving edits (bit/byte flips, arithmetic nudges, swaps, splices), and, with low probability, performs structural changes like short inserts/deletes within bounds, selectively repairing checksums and length fields when the mutated region implies them so packets keep crossing length/CRC gates. The result is near-valid perturbations that drive GPSD's state machine. Note the strategy only rotates every 50k executions — one mutation family gets enough time to prove whether it is useful before the fuzzer switches gears.

Custom mutator functions

My mutator functions live in mutations.rs and are grouped into three tiers: byte-level edits (bit flips, arithmetic add/sub/mul/div, XOR, byte swaps), structural transformations (block swaps, inserts, overwrites, splices), and protocol-aware repairs (checksum fixes, length-field adjustments, magic-byte injection). Each function accepts a mutable slice of the input buffer, an RNG state for deterministic randomness, and optional parameters like mutation intensity or target offsets. They are composable, stateless, and side-effect-free beyond modifying the buffer, so the main mutator in lib.rs can chain them, apply them conditionally based on packet-type hints, or gate expensive ones (checksum recalculation) behind low-probability thresholds to keep exec/sec high.

The mutator implementation itself (GpsLexerPacketDataMutator in lib.rs) wraps these primitives in a LibAFL Mutator trait, randomly selecting one strategy per call to mutate() and invoking the corresponding function on input.inbuffer[..input.inbuflen]. It skips operations that would exceed buffer bounds. Keeping mutation logic in mutations.rs separate from scheduling and orchestration in lib.rs makes it easy to add new mutations, tune probabilities per protocol family, or switch strategies dynamically based on coverage feedback without touching the fuzzing loop.

Fuzzer architecture

The fuzzer is split into boring filenames that each do a real job:

`checksums.rs`

Houses protocol-specific checksum and CRC repair functions (NMEA for now, more planned). Each function accepts a mutable byte slice, computes the correct checksum for that protocol, and overwrites the trailing checksum bytes in place. These are called probabilistically by the mutator to repair packets after structural edits, ensuring more inputs pass length/CRC gates and reach deeper parser logic.

`gps_data_corpora.rs`

Defines static arrays of valid protocol samples, one per supported GPS/GNSS family (NMEA, AIVDM, UBX, RTCM2/3, GREIS, TSIP, etc.) as either &'static str or &'static [u8]. The generator and some mutators pull from these arrays when seeding or splicing known-good frames into the input. Each corpus is hand-curated to include baseline valid packets from real captures or protocol specs, edge-case variants (empty fields, minimal payloads, maximum field values, boundary timestamps), and fuzzing samples designed to trigger common bugs (oversized length claims, format confusion with wrong preambles, checksum corruption, state-machine attacks with rapid resets, injection payloads, temporal attacks at epoch boundaries). Keeping them in a separate file makes it trivial to add new samples or swap corpus sources without touching fuzzing logic. The corpus is compile-time static, so there is no runtime allocation overhead, just direct slices into .rodata. (The corpora purposefully ship with only one entry per static array in the public repo.)

`data_generation.rs`

Contains the helpers that orchestrate corpus selection and assembly: generate_one_packet() maps a protocol-family index (0..22) to the corresponding corpus array, picks a random sample using the RNG, and returns a (&'static [u8], i32) tuple (bytes + packet_type hint); populate_inbuffer() fills a destination buffer with one or multiple packets (depending on a fill flag), concatenating samples until the buffer is full or a packet truncates. These use the corpora from gps_data_corpora.rs but add the logic for randomization, multi-packet assembly, and size capping. They bridge the static data and the dynamic fuzzer state (RNG, config), making it easy to change corpus selection strategy or switch from static arrays to on-the-fly generation without modifying the generator trait in lib.rs.

`gpsbindings.rs`

Contains the Rust mirror of GPSD's gps_lexer_t (as GpsLexerT) with #[repr(C)] to match C layout exactly, plus helper types (TimeSpec, IsgpsState, GpsdErroutT) and constants (MAX_PACKET_LENGTH, RTCM2_WORDS_MAX). It also implements Input, HasLen, and serialization traits so LibAFL can use GpsLexerT as an input type. Any change to the C struct or config flags (e.g. STASH_ENABLE) must be reflected here, or layout mismatches will crash the harness. GPSD developers occasionally update this structure, so periodic syncs are necessary.

`lib.rs`

The core fuzzing logic: defines GpsLexerGenerator (implements Generator<GpsLexerT, S> to produce initial seeds), GpsLexerPacketDataMutator (implements Mutator<GpsLexerT, S> and dispatches one of many mutation strategies per iteration), and the fuzz() entry point that wires together LibAFL components (state, corpus, scheduler, observers, feedbacks, executor, fuzzer loop). It also exposes LLVMFuzzerTestOneInput (the harness that clones the input, repoints inbufptr, and calls packet_parse) and a fuzzer_main wrapper for standalone runs. This is the orchestration layer; everything else is helpers or data.

`main.rs`

Command-line driver for standalone operations outside the fuzzing loop: crash reproduction (loads a serialized crash file, deserializes or rebuilds a GpsLexerT, calls the harness, prints debug info), single-input runs (--crash <file>), and optional GDB attachment helpers. It wraps the harness in signal handlers and optional ASan/UBSan setup to match the fuzzer's fault detection. Useful for triaging crashes, minimizing inputs, or running under debuggers/profilers without spinning up the full LibAFL event manager.

`mutations.rs`

Houses the 20+ mutation primitives invoked by GpsLexerPacketDataMutator: byte-level (add/sub/mul/div, XOR, bit flip, byte swap), structural (block swap, insert random bytes, splice corpus samples, overwrite magic headers), and protocol-aware (length-field corruption, endian swaps on plausible widths, satellite ID/timestamp nudges). Each function is stateless, side-effect-free, and operates in-place on a mutable slice with an RNG state parameter for deterministic randomness. Keeping them separate from the mutator trait makes unit testing, profiling, and combinatorial chaining straightforward.

Future work

I would like to keep working on this in my spare time. GPSD is an awesome project and it is important because of where it runs. I think I can improve the fuzzer by targeting different functions reachable via attacker-controlled data.

Beyond the current packet_parse target, there are a few promising directions. First, move upstream in GPSD's call chain to fuzz entry points closer to the network boundary — the socket handlers that receive data before it reaches the lexer. That captures more realistic attack surface where protocol confusion, framing errors, and timing-dependent bugs live. Second, implement protocol-specific harnesses for the individual driver backends (NMEA, UBX, RTCM3, etc.) to exercise the deeper parsing logic that is only reachable after packet_parse classifies a packet, which means building separate generators per protocol with full field-level structure awareness. Third, build a proper crash triage pipeline with automated minimization and stack-hash deduplication. My current pipeline for this stuff is rough, in short.

Conclusion

Building this structure-aware fuzzer for GPSD turned out to be one of the more technically rewarding projects I have tackled in a while. Implementing custom Generator and Mutator traits for a complex C structure with no out-of-the-box support meant wrestling with LibAFL's type system, understanding the interactions between state management and trait bounds, and designing mutation strategies that could intelligently perturb a 36KB+ structure without destroying semantic validity. Thanks to the people in the awesome fuzzing Discord for answering questions along the way.

The structure-aware aspect was the real challenge. LibAFL does not natively support fuzzing structured inputs like gps_lexer_t — it is optimized for byte slices and assumes you will handle structure preservation yourself. That meant exposing only the relevant mutation surface (inbuffer[..inbuflen]), carefully managing pointer aliasing across the FFI boundary, and encoding protocol semantics into the generator without hardcoding fragile assumptions. The breakthrough was realizing that structure-aware fuzzing is not about perfect validity, it is about maximizing the probability that a mutated input reaches interesting parser states.

Performance tuning taught me that mutation efficiency compounds in coverage-guided fuzzing. Gating expensive operations behind low-probability thresholds, rotating mutation strategies every 50k execs instead of per-call, and eliminating unnecessary allocations in the generator recovered a 2-3x speedup. Every wasted cycle is one less edge discovered, one less corpus addition, one less chance at a deep bug.

The fuzzer did uncover several bugs during development, but GPSD's maintainer is exceptionally proactive — by the time I finished triaging and minimizing reproducers, many had already been patched upstream. That speaks both to the quality of GPSD's maintenance and to the importance of continuous fuzzing. Architecturally, I made the classic mistake of fuzzing packet_parse without fully mapping GPSD's call graph; it is downstream of network entry points, so I am missing coverage of socket handling, HTTP chunking, and protocol autodiscovery. If I were starting over I would fuzz one layer up. That said, packet_parse is still a massive attack surface (23 protocol families, nested state machines, extensive length/CRC validation), and sometimes fuzzing a complex internal function beats modeling all the preconditions of a simpler entry point. That tension is exactly what Part 2 digs into.

Source

The harness described here is open source: github.com/xchglabs/gpsd-driver-fuzzer.

Continue to Part 2: Lessons Learned.

Fuzzing GPSD,Part 1: The Lexer Harness

Picking a target function

Environment and build setup

Implementing structure-aware fuzzing in LibAFL

The gps_lexer_t structure

Custom generator

Custom mutator

Custom mutator functions

Fuzzer architecture

checksums.rs

gps_data_corpora.rs

data_generation.rs

gpsbindings.rs

lib.rs

main.rs

mutations.rs