Like what you see? ⭐ Star the repo ⭐ to support the project and keep it in the spotlight. See the stargazers →

Apache 2.0 licensed. No vendor lock-in. Self-hosted.

One endpoint. Infinite scale.

Kafka-compatible streaming platform.
Scale streaming and analytics cloud-native on S3. Automated.

What teams are saying

"After WarpStream got acquired, KafScale became our go-to. Better S3 integration, lower latency than we expected, fully scalable, and minimal ops burden."

— Platform team, Series B fintech

"We moved 50 topics off Kafka in a weekend. No more disk alerts, no more partition rebalancing. Our on-call rotation got a lot quieter."

— SRE lead, e-commerce platform

"The Apache 2.0 license was the deciding factor. We can't build on BSL projects, and we won't depend on a vendor's control plane."

— CTO, healthcare data startup

Why teams adopt KafScale

One endpoint, infinite producers

Kafka clients discover partition leaders and connect to each broker directly. KafScale's proxy rewrites metadata responses. One DNS name. Brokers scale behind it. Clients never break.

Stateless brokers

Spin brokers up and down without disk shuffles. S3 is the source of truth. No partition rebalancing, ever.

S3-native durability

11 nines of durability. Immutable segments, lifecycle-based retention, predictable costs.

Storage-native processing

Processors read segments directly from S3, bypassing brokers entirely. Streaming and analytics never compete for the same resources.

Open segment format

The .kfs format is documented. Build custom processors without waiting for vendors to ship features.

Apache 2.0 license

No BSL restrictions. No usage fees. No control plane dependency. Fork it, sell it, run it however you want.

What You Should Consider

KafScale is not a drop-in replacement for every Kafka workload. Here's when it fits and when it doesn't.

KafScale is for you if

  • Latency of 200-500ms is acceptable
  • You run ETL, logs, or async events
  • You want processors that bypass brokers (Iceberg, analytics, AI agents)
  • You want minimal ops and no disk management
  • Apache 2.0 licensing matters to you
  • You prefer self-hosted over managed services

KafScale is not for you if

  • You need sub-10ms latency
  • You require Kafka transactions (exactly-once across topics)
  • You rely on compacted topics
  • You want a fully managed service

How KafScale works

Clients connect to a single proxy endpoint. The proxy rewrites Kafka metadata responses so clients never see broker topology. Brokers flush segments to S3. Processors read directly from S3 without touching brokers.

Kafka clients Any library single IP Proxy One endpoint Kubernetes Broker 0 Broker 1 Broker N Stateless, scale with HPA Processors S3 Source of truth. 11 nines durability. Immutable .kfs segments. segments direct read

Streaming and analytics share data but never compete for resources.

Built for AI agent infrastructure

AI agents making decisions need context. That context comes from historical events: what happened, in what order, and why the current state exists. Traditional stream processing optimizes for milliseconds. Agents need something different: completeness, replay capability, and the ability to reconcile current state with historical actions.

Storage-native streaming makes this practical. The immutable log in S3 becomes the source of truth that agents query, replay, and reason over. The Iceberg Processor converts that log to tables that analytical tools understand. Agents get complete historical context without competing with streaming workloads for broker resources.

Two-second latency for analytical access is acceptable when the alternative is incomplete context or degraded streaming performance. AI agents do not need sub-millisecond reads. They need the full picture.

Processors

Processors read completed segments directly from S3, enabling independent scaling and custom implementations. The .kfs segment format is open and documented.

Iceberg Processor

Reads .kfs segments from S3. Writes Parquet to Iceberg tables. Works with Unity Catalog, Polaris, AWS Glue. Zero broker load.

Documentation

SQL Processor (KAFSQL)

Query KafScale segments with Postgres-compatible SQL. No Flink, no Spark, no complex pipelines. Just SQL against your Kafka data in S3.

Documentation

Build your own

The .kfs segment format is documented. Build processors for your use case without waiting for vendors or negotiating enterprise contracts.

Storage format spec

Documentation

Protocol compatibility

21 Kafka APIs supported. Produce, Fetch, Metadata, consumer groups, and more.

View API docs

Storage format

Segment layout, index structure, S3 key paths, and cache architecture.

Explore storage

Operations

Deployment, scaling, backups, monitoring, and production hardening.

Operations guide

Get started

Install the operator, define a topic, produce with existing Kafka tools. If you already run Kubernetes and Kafka clients, you can deploy a cluster and start producing data in minutes.

Backed by

KafScale is developed and maintained with support from Scalytics, Inc. and NovaTechflow.

Apache 2.0 licensed. No CLA required. Contributions welcome.