The serving layer for customer-facing analytics

Stop paying your warehouse to answer the same question a million times.

Your app shows the same dashboards and charts to users all day. Every view re-runs an expensive query on your data warehouse — so you pay for it again, and users wait. Offloader serves those reads from a cheap, pre-built copy on your own servers instead — the same numbers, at a fraction of the cost, and far faster.

Run it in 15 minutes How it works

Self-hosted. Validated on real production data — 66 datasets, 67 endpoints. No vendor cloud — your private data never leaves your environment.

QUERY   revenue_by_day(account, from, to)
PRICE   $0.70/DBU   typical serving-layer rate
COST    $0.012      ≈ DBUs one read burns
RUNS    100,000   identical, this month
        ──────────────────────────
BILL    $1,200     to your warehouse

Same answer, billed on repeat. Illustrative — use your own numbers

How it works

Answer it once.
Serve it as long as it’s fresh.

Your pipeline publishes a snapshot on its own schedule. Offloader serves the reads. The warehouse only ever touches the snapshot — never your live product traffic.

On your schedule Your warehouse Snowflake · Databricks · BigQuery

snapshot

Object store Parquet + manifest S3 · GCS

materialize

Self-hosted Offloader · DuckDB validated · zero-downtime swap

Cached REST

Every request Your app dashboards · stats · front-end

Live reads are served entirely from the snapshot — your warehouse never sees product traffic.

01
You publish a snapshot

Export the dataset to Parquet in S3 or GCS with a small manifest — on your schedule, whatever freshness your data tolerates.
02
Offloader materializes it

The container loads the latest snapshot into an embedded DuckDB engine and validates it against your dataset’s contract before it ever serves.
03
Your app calls a REST endpoint

Named, versioned, key-scoped endpoints — no arbitrary SQL. Reads are answered locally, fast, at product-traffic scale.
04
A newer snapshot swaps in

Zero-downtime, blue-green — even across a schema change. A bad snapshot never swaps in; the last good one keeps serving.

Every response tells you exactly what you read

{
  "data": [ { "account_id": "acct_zephyr", "api_calls_total": 56839 }, … ],
  "meta": {
    "endpoint":    "customer_usage_summary",
    "snapshot_id": "2026-06-01T00:00:00Z_r0007",   // which snapshot answered this
    "freshness":   { "watermark": "2026-06-01T00:00:00Z", "age_seconds": 2658070 }
  }
}

What you serve

The reads that blow up
your warehouse bill.

High volume, the same shapes all day, per-user or public, minutes-fresh is fine. A CDN in front helps the cache hits — but every miss still hits a slow, metered warehouse. Offloader is the fast, cheap origin underneath.

Customer-facing analytics

The dashboards and charts you show your users — usage, performance, insights.
Usage metering & billing

Per-account counters that power invoices and the in-app usage meters users watch.
Personalization & recommendations

“Recommended for you,” trending, and related items — computed in the warehouse, served per user at page load.
Public stats & leaderboards

Rankings, counts, and public profile pages that get hammered by traffic.
Embedded reports

The report and export screens customers open again and again, each a repeated query.
Search facets & filters

The live counts next to every filter — “In stock (1,240)” — recomputed on each load.

Proof

Measured against real production data.

66 / 67

Real scale, not a fixture

Validated at 66 datasets and 67 endpoints, cold-booted against a real production GCS bucket.

p95 66 ms

Measured in production

p50 37 ms, p95 66 ms at the load balancer, across ~136 million requests a month — down from multi-second behind the warehouse.¹

tenant = you

Isolation is compiled in

The tenant filter is inserted server-side from the caller’s key. A request cannot widen it or read another tenant’s rows — no arbitrary SQL, ever.

0 downtime

Config that reloads live

Push config to a bucket and it hot-reloads with no restart — blue-green even when a schema changes. A broken revision is ignored; the running one keeps serving.

1. Latency from a real production deployment — ~136 million requests a month, 8 TB served, ~94% cached, on two small VMs, measured at the load balancer. Your latency and savings depend on your data, payloads, and hardware; the benchmark method and harness are in the docs.

Pricing

Priced on what we save you.
Not on requests.

Warehouse serving bills one of two bad ways: always-on compute sized for your peak — paid while it idles, still slow from a cold start — or a per-request meter, where your busiest month is your most expensive. Offloader charges 20% of the savings it creates. You keep the other 80%.

Warehouse serving, before     $3,000 /mo
Offloader on two small VMs   −$440 /mo
      ──────────────────────────
Saved on serving each month   $2,560 /mo
Our fee, 20% of that saving  −$512 /mo
      ──────────────────────────
You keep, 80% of the saving   $2,048 /mo

If you don’t save, we don’t earn. A real cutover — anonymized

What that $440 served: 136M requests a month, 8 TB, 94% cached — at the CDN edge, with no unplanned downtime since cutover, on the customer’s own two VMs.

How the ongoing bill works

A paid diagnostic sets the baseline once — a rate per million requests, agreed before you commit — so we never re-audit your warehouse bill; your own request count does the math. From there it’s 20% of the measured saving, reconciled quarterly, net of what Offloader costs to run. No saving, no fee. For a reserved-capacity warehouse the fee starts only when you actually downsize the tier — moving query volume off it doesn’t lower a committed bill by itself, and we won’t pretend it does. See the ROI diagnostic →

Public data? Serve it from our edge.

For product-facing public data at high volume — the tables your team builds in the warehouse and ships to your front-end — we can also run it on our global CDN edge: serving cost approaches zero, latency goes worldwide. It’s already public, so it stays public — an optional managed add-on, same share-of-savings model, quoted per case.

Book a paid diagnostic Books a call with me — pick a time that works; you talk to a person, not a queue. Fixed scope; we measure your reducible spend before you commit.

Fit

Made for one job.
Honest about the rest.

Offloader offloads repeated, bounded reads. It is not a warehouse, a BI tool, or a place to run ad-hoc SQL. If native acceleration already solves your problem, we’ll tell you.

A fit when

The same query shapes repeat a lot.
“A few minutes or hours old” is fine.
You can export snapshots to S3 or GCS.
You want to cut warehouse serving cost without a rewrite.

Not a fit when

Every query is different or ad-hoc.
You need up-to-the-second data.
You can’t produce snapshots.
Native warehouse acceleration already covers it.

FAQ

The questions we actually get.

How it’s different

Isn’t this just a cache in front of my warehouse?

No. A cache still calls the warehouse on a miss and knows nothing about who’s asking. Offloader answers reads entirely from the snapshot — the warehouse is never on the live path — through named, versioned endpoints with a compiled-in tenant filter and a column allowlist. There’s no arbitrary SQL to cache, and no per-request warehouse bill.

We already run a CDN in front of it — isn’t that enough?

A CDN only saves the hits. Even a 94% hit rate means 6 of every 100 requests still miss — on a high-volume endpoint that’s millions a month, each one hitting the origin: billed by the warehouse, and slow when it’s cold. That’s why p95 stays in the seconds behind a “well-cached” warehouse — your tail latency is set by the misses, not the hits. Offloader fixes the origin instead. Keep your CDN — Offloader emits proper ETag/Cache-Control so it keeps caching — and now the misses land on a fast, flat-cost box you own instead of a metered warehouse.

How is this different from ClickHouse, Tinybird, or warehouse-native acceleration?

A dedicated serving database — ClickHouse, Tinybird, Rockset — is a second system to load, sync, scale, and operate: a dual write and a new source of truth to reconcile with the warehouse. Offloader is a container and a bucket. Your warehouse stays the source of truth; you serve its snapshots, unchanged. Warehouse-native acceleration (BI Engine, serverless SQL, materialized views) keeps you on the vendor’s meter and its cold starts — you’re still paying per query or per compute-hour for the same high-volume reads. Offloader moves those reads off the meter, onto a flat-cost box you own.

Will it fit your data

How fresh is the data?

As fresh as you publish snapshots — minutes to hours, your call. Every response carries its snapshot_id and a freshness watermark, so a client always knows exactly what it read. If you need up-to-the-second data, Offloader isn’t the right tool, and we’ll tell you.

How do isolation and security work?

The tenant filter is inserted server-side from the caller’s API key; a request cannot widen it or read another tenant’s rows. Keys are stored as SHA-256 hashes, endpoints are allow-listed per key, and a public (auth: none) mode is only accepted when no endpoint is tenant-scoped. Your private data never leaves your environment. Read the security model →

My data is per-user — can a snapshot handle the cardinality?

Tenant count doesn’t matter: an endpoint is one table plus a compiled-in filter, so a million users share one snapshot and each request reads only its own slice. What matters is total snapshot bytes. In the validation, all 66 datasets sat in about 4 GiB of RAM on one instance; you cap it with a memory limit and scale by running more stateless instances or partitioning snapshots by time. If your working set genuinely won’t fit a box, that’s exactly the case we’ll tell you isn’t a fit.

Which warehouses does it work with?

Any that can export Parquet — Snowflake, BigQuery, Redshift, a Spark job — via a small manifest. Databricks additionally discovers the latest snapshot for you. Offloader never connects to the warehouse for live traffic; it only reads the snapshots you publish.

What does it actually take to run?

A container and a bucket. The workload we validated against runs ~135 million requests a month; a single small instance sustains that comfortably — about 52 requests a second on average, well inside the ~5,200/s a single instance hit in benchmark — so you’d run two small instances for HA, not for capacity. The cost is the box, flat, no matter how many reads hit it: the expensive computation already happened once, in your pipeline.

Cost & commitment

Is there lock-in?

No. It’s a container you run on your own infrastructure; the config is a handful of YAML files and the data is your own Parquet in your own bucket. Turn it off and your warehouse, data, and app are exactly as they were.

How does pricing actually work?

A paid diagnostic measures your reducible warehouse spend and you agree a baseline before committing. From there it’s 20% of the savings, net of what Offloader costs to run on your own infra — you keep the other 80%. No per-request meter. See pricing →

Who’s behind Offloader — and what if it’s just you?

Offloader is built and supported by Andrew Dryga. It’s deliberately the kind of thing that survives a solo maintainer: the code is public on GitHub, it runs entirely on your own infrastructure, and there’s no lock-in — if we vanished tomorrow, your container, config, data, and app keep working exactly as they are. Support in V1 is a response-time commitment, not an uptime SLA (you run it, so uptime is yours); you email a person, not a queue.

Are you insured?

Yes. The diagnostic and pilot are backed by $1M professional indemnity (errors and omissions) insurance, valid worldwide including the US and Canada, underwritten by Colonnade (a Fairfax company). The pilot contract caps liability to match, and we’ll send a certificate of insurance to your procurement team on request.

Get started

Run it in fifteen minutes.

It’s a container you run on your own infrastructure. Point it at a bundled example and serve a real endpoint locally — no cloud, no signup. When you’re ready for your own data, the config is a handful of YAML files.

The 15-minute quickstart Start from the concepts

Weighing a real migration? A paid diagnostic clusters your repeated reads and estimates what’s actually reducible — before you commit. See what it covers or book one directly.

docker

$ docker run \
    -e OFFLOADER_CONFIG=/cfg/offloader.yml \
    -e OFFLOADER_SECRET_KEY_BASE=$(openssl rand -hex 24) \
    -e OFFLOADER_CACHE_DIR=/cache \
    -v ./examples/public-metrics:/cfg:ro \
    -v offloader-cache:/cache \
    -p 4000:4000 offloader:dev

$ curl "localhost:4000/v1/endpoints/champion?champion_id=1"
→ 200  served from a snapshot, not your warehouse

Stop paying your warehouse to answer the same question a million times.

Answer it once.Serve it as long as it’s fresh.

You publish a snapshot

Offloader materializes it

Your app calls a REST endpoint

A newer snapshot swaps in

The reads that blow upyour warehouse bill.

Customer-facing analytics

Usage metering & billing

Personalization & recommendations

Public stats & leaderboards

Embedded reports

Search facets & filters