— Blog / Engineering

Feeding large JSON to an LLM without blowing the context window.

Engineering Marek Holub May 12, 2026 9 min read

Pasting a 200MB API response into ChatGPT is a fast way to burn dollars and earn a context-window error. Here's how to feed real-world JSON to an LLM without either.

Why JSON destroys LLM context windows

JSON was designed for machines reading machines. It was not designed to be cheap in tokens. Every {, every ", every duplicated key is paid for in the same currency as your actual signal. A pretty-printed 100KB JSON file lands in the 30k–40k token range under cl100k-style tokenizers — and that's before anything interesting happens.

Three things make it worse:

Then you hit the wall. Claude Sonnet and Opus cap at 200k tokens. GPT-4o caps at 128k. Gemini 1.5 and 2.5 will take a million, but you're paying per token the whole way down. At Sonnet's input rate, a single 200k-token query is roughly $0.60. Run that 100 times a day in a debugging loop and you've spent $60 on JSON you didn't read.

Five strategies for shrinking JSON for an LLM

None of these are clever. All five compound.

1. Strip whitespace. Compact-print the JSON before you do anything else. On pretty-printed input, this is a free 20–40% win. Anything that reads JSON can produce compact JSON; the trick is remembering to do it before you paste.

2. Drop noisy keys. The model does not need created_at_microseconds, the SHA of an internal request ID, an auth header echo, or a base64 thumbnail. Use schema inference to find which keys carry data and which carry plumbing. Plumbing keys are usually the longest values.

3. Sample, don't include. If you have an array of 50,000 homogeneous objects, the model learns nothing extra from object 49,997 that it didn't already learn from objects 1–5. Send the schema and a random sample of 3–5 records. State that it is a sample. The model is fine with that.

4. Filter by predicate. When you're debugging, you don't want the whole dataset. You want the records where status == "error", or where retry_count > 3. Filter server-side or pipeline-side before the JSON ever reaches a chat box.

5. Truncate long values. A 50KB log body is not 50KB of signal. The first 256 bytes — the error class, the start of the message — is enough for the model to reason about it. Truncate with a marker like "…(truncated)" so the model knows it's not getting the full string and won't confabulate the rest.

One number to remember. 1 KB of typical JSON ≈ 300–400 tokens. If your file shows 487 KB on disk, assume ~180k tokens of context, then start cutting.

A concrete pipeline with the jb CLI

The five strategies are tool-agnostic. The friction is in stitching them together. Most people end up with a fragile chain of jq filters, python -c one-liners, and tabs-of-shame. The jb CLI was built to do the whole pipeline in one pass — see jq alone won't get you there for why we stopped using it on large files.

Start by looking at what you actually have:

# What's in this thing? Tree shape with types, one pass over the file.
$ jb schema response.json

# Every distinct key name, deduped. Spot the plumbing keys.
$ jb keys response.json
# trace_id
# request_hash
# status
# body
# ...

# Every path in the file, deduped. Spot which arrays carry the payload.
$ jb paths response.json | head

Now you know what's bloating the file. Filter to the records you care about — jb search with --where applies a predicate and --emit object returns the full matching object:

# Only the errors. The predicate evaluates against each iterated property.
$ jb search --where '.status == "error"' --emit object response.json > errors.json

# Or get just one path across the whole file, in JSONL form.
$ jb extract '.results[*].body' --format jsonl response.json > bodies.jsonl

Then run it through the model-shaped formatter. The --ai flag on jb search is a shortcut for --format jsonl --envelope --max-output 1M --max-value-bytes 256: every match becomes a JSONL line of the form {path, preview, value}, total output is capped at 1 MB, and every string value is truncated to a 256-byte preview. Long bodies stop costing you context tokens.

# --ai bundles four flags: jsonl + envelope + max-output 1M + max-value-bytes 256
$ jb search --where '.status == "error"' --ai response.json > for_llm.jsonl

# Or pipe straight to clipboard
$ jb search --where '.status == "error"' --ai response.json | clip       # Windows
$ jb search --where '.status == "error"' --ai response.json | pbcopy     # macOS

# If you need to drop specific keys from the object, pipe through jq.
# jb is for finding and slicing; transforms are jq's job.
$ jb search --where '.status == "error"' --emit object response.json \
    | jq -c 'del(.trace_id, .request_hash, .auth_echo)' > clean.jsonl

Illustrative numbers from a representative 487 MB API-dump payload — your file's mix of long-string keys vs. small-value keys will shift the bytes column, but the order-of-magnitude arc is what matters:

StageBytesTokens (est)Cost @ Sonnet
Raw response487 MB~180Mdoesn't fit
Filter to status == "error"14 MB~5.2Mstill doesn't fit
Drop noisy keys3.1 MB~1.1M$3.30
jb search --ai (envelope + 256B preview cap)54 KB~18k$0.05

From "won't fit" to "five cents and three seconds". The model still gets every error type, every status value, the start of every log body, and enough of the surrounding keys to reason about the shape. It just doesn't get the 41KB of stack traces from each of 48,000 records.

When to stream instead of paste

This whole post is about ad-hoc work: debugging an incident, exploring a new API, running a one-off eval. If you're in production — an agent loop, a RAG pipeline, a customer-facing app — you should not be hand-pasting JSON. You should be using function calling, embeddings, or a retrieval layer, all of which sidestep the context-window problem instead of fighting it.

Paste-the-JSON is a developer tool. Treat it as one.

A worked example: 200MB API response, $0.05 query

Concrete scenario. The mobile team says a personalization endpoint started returning wrong recommendations at around 02:00 UTC. They send you a 200MB dump of the response payload from a four-hour window. The dump has 73,000 user sessions, each with a recommendations array, telemetry, a feature-vector blob, and a debug log body.

You don't want to scroll through that. You want to ask Claude one question: "Look at these error sessions, what's the common pattern in the feature vectors?"

$ jb schema dump.json
# → 73000 session objects, 14 keys each, feature_vector is the largest by bytes

# Pull the bad sessions as full objects, drop the heavy keys with jq,
# then feed the result back through jb with --ai for envelope + caps.
$ jb search --where '.recommendation_quality < 0.3' --emit object dump.json \
    | jq -c 'del(.debug_log_body, .feature_vector_raw)' \
    | jb search --ai - > bad_sessions.jsonl

$ wc -c bad_sessions.jsonl
# 71 KB. About 24k tokens.

Paste bad_sessions.json into a chat with a one-line prompt: "These are the personalization sessions that scored badly between 02:00 and 06:00 UTC. What's the most common feature-vector cluster among them?"

The model answers in seconds, not minutes, and the bill for that exchange is roughly the cost of one coffee, not one dinner. The raw 200 MB dump wouldn't have fit in any frontier model's context window at all — and even on a hypothetical model that took it, the input cost alone would have run into the low three figures per query before the response started typing. The arithmetic is unkind, but the fix is mechanical.

For more on dealing with the raw file before you compact it, see opening large JSON files without crashing your editor.

Pitfalls to avoid

The whole stack

TrickToken reductionCost to apply
Compact-print20–40%one flag
Drop noisy keys30–80%one schema pass
Predicate filter50–99%one expression
Truncate long values40–95%built into --ai
All of the above, pipedtypically 99%+one command

JSON is verbose. LLMs charge by the token. The arithmetic is unkind, but the fix is mechanical: see the shape, drop the noise, truncate the bodies, ship the rest.

Download jsonbolt if you want the GUI + the jb CLI in one install. Free for personal use up to 50MB; Pro is $80/year if you're doing this in anger. The --ai flag is in every tier.

← All posts jsonbolt · v1.4.2