— Blog / Engineering

How we parse JSON at 2 GB/s.

Engineering Anika Rao May 17, 2026 7 min read

On day one this parser ran at a few hundred MB/s — recursive descent in clean Rust, which is about what you'd expect. Today it sits around 2 GB/s on the same workload, in the same app, on the same JSON. None of the techniques that got us there are novel; most are well-documented in the JSON-parsing literature. The interesting part is how they compound when the parser is written for a specific consumer instead of a generic AST.

The starting point

Day one was a clean recursive-descent parser in Rust — a few hundred MB/s, which is roughly what recursive descent gets you on a modern CPU. We profiled before touching anything. perf top showed the usual shape: most of the cycles in byte-scan loops, a meaningful chunk in branch mispredictions, and a real share in the allocator.

Three buckets. The next three sections take them in order. The section after those is the one that mattered most, and it doesn't show up on a profile.

1. A flat intermediate (the tape)

Most JSON parsers build a tree: each object becomes a HashMap, each array a Vec, each scalar an enum variant on the heap. The tree is convenient — it's the shape your code wants to consume — and it's also where most of the parse cost lives. A 1 GB file produces tens of millions of allocations, with all the per-allocation overhead, fragmentation, and pointer-chase cost that implies.

We don't build a tree. We build a tape: a flat array of structural events — open object, close array, key starts here, value ends there. The general idea has been in the JSON-parsing literature for years; we use the variant that fits how we want to consume the result.

Approach	1 GB file, open	Resident RAM
Generic parser → object tree	2.1 s	4.2 GB
Streaming SAX	0.9 s	40 MB
Tape (jsonbolt)	0.51 s	180 MB

Parse time is competitive with the streaming approach; resident memory is an order of magnitude under the tree approach. The cost is that the rest of the application has to be written against the tape — which we did, and which the rest of this post is mostly about.

2. SIMD for the scan

Almost every byte in a JSON file is content you skip past. The structural characters — quotes, brackets, commas, colons, backslashes — are a few percent of the file at most. Scanning one byte at a time wastes a CPU that can compare thirty-two in parallel.

We use AVX2 on modern x86, with an SSE2 fallback for older CPUs and NEON paths on Apple Silicon. Each 32-byte chunk produces a bitmask of match positions for the structural bytes; we walk those positions, emit offsets to the tape, advance past the chunk, repeat. Content bytes never get re-examined until something downstream asks for the value.

This part is well-trodden territory — simdjson, RapidJSON, yyjson, and sonic-rs all do variants of the same trick, documented in detail in the Langdale & Lemire 2019 paper if you want the formal version. The advantage of writing your own is that you can tune the scan to exactly the structural events your downstream consumer needs, and skip work the off-the-shelf parsers do because they don't know what you'll do with the result.

3. Lazy values

The structural pass produces offsets and types, not values. Numbers aren't parsed yet. Strings aren't unescaped yet. The six bytes \u00e9 are still sitting in the source buffer; nothing has turned them into 'é'.

That deferred work — number parsing, escape decoding, the IEEE-754 conversion path — happens the moment something asks for the value. A virtualized viewer renders one screenful of rows out of a million; that screenful is the only place we pay the per-value cost. The other 999,000 rows stay as (offset, type) and cost nothing until something materializes them.

The trade-off, stated honestly: if you iterate the full file, you pay the deferred cost back, and a tight scan over every value will be slower than a generic parser that materialized everything upfront. For viewer workloads, search workloads, and partial-extraction workloads — what jsonbolt is built for — the math comes out heavily in favor of laziness.

4. Why a custom parser, though?

At some point in the design you ask the obvious question: why not use one of the excellent off-the-shelf SIMD parsers — simd-json, sonic-rs, serde_json with the SIMD feature — and put the engineering time into the UI?

We tried. Here's what we learned.

A generic parser produces a generic representation, usually a tagged enum tree. The consumer walks that representation to build its own representation — for virtualization, for search, for the minimap, for JSONPath autocomplete — each of which wants a different access pattern over the same underlying data. Parse once, walk again to translate, walk a third time to render. By the end you're holding two copies of the structure in memory and you've spent most of the parser's speed advantage on translation.

The tape is the consumer's representation. The virtualized tree iterates it directly — click row 5,000 of a million-element array, and you're indexing into the tape, not traversing pointers. The minimap is rendered from the tape, not the file. JSONPath autocomplete reads tape entries to suggest the keys and indices that actually exist. Search runs over offsets, not parsed values. None of these features build a parallel tree on top of the parsed result.

The expensive part of using a JSON parser is rarely the parser. It's the translation between what the parser hands you and what your code wants to do with it. If your consumer is generic AST traversal, the off-the-shelf SIMD parsers are already as fast as you reasonably need. If your consumer has specific access patterns — virtualization, partial materialization, structural search, columnar projection — a parser that produces its native shape is worth the engineering.

About half of the 2 GB/s number is what the parser does. The other half is what it doesn't have to do, because the rest of the app consumes the parser's output directly.

Four well-documented techniques, compounding under a parser written for the consumer that reads it. Together: the difference between "I'll get coffee while this opens" and "wait, it's already open."

Try it on the largest JSON file you have. If it doesn't surprise you, tell us why. Download →

← All posts jsonbolt · v1.4.2