Rationale
KafScale exists because the original assumptions behind Kafka brokers no longer hold for a large class of modern workloads. This page explains those assumptions, what changed, and why KafScale is designed the way it is.
The original Kafka assumptions
Kafka was designed in a world where durability lived on local disks attached to long-running servers. Brokers owned their data. Replication, leader election, and partition rebalancing were necessary because broker state was the source of truth.
That model worked well when:
- Disks were the primary durable medium
- Brokers were expected to be long-lived
- Scaling events were rare and manual
- Recovery time could be measured in minutes or hours
Many Kafka deployments today still operate under these assumptions, even when the workload does not require them.
Object storage changes the durability model
Object storage fundamentally changes where durability lives.
Modern object stores provide extremely high durability, elastic capacity, and simple lifecycle management. Once log segments are durable and immutable in object storage, keeping the same data replicated across broker-local disks stops adding resilience and starts adding operational cost.
With object storage:
- Data durability is decoupled from broker lifecycle
- Storage scales independently from compute
- Recovery no longer depends on copying large volumes of data between brokers
This enables a different design space where brokers no longer need to be stateful.
Storing Kafka data in S3 is not new. Multiple systems do this. What matters is what you do with that foundation.
What KafScale actually changes
S3-native storage is table stakes. The real question is: how much operational complexity remains?
The same architectural shift already transformed data warehouses. Separating compute from storage did not just reduce costs. It simplified operations, enabled independent scaling, and changed what was possible. Streaming is following the same path.
KafScale removes four categories of coupling that other systems preserve:
Clients from topology
Proxy rewrites metadata. One endpoint, infinite brokers behind it. Clients never see scaling events.
Compute from storage
Brokers hold no durable state. S3 is the source of truth. Add or remove pods without moving data.
Streaming from analytics
Processors read S3 directly. Batch replay and AI workloads never compete with real-time consumers.
Format from implementation
The .kfs segment format is documented and open. Build processors without vendor dependency.
Each layer removes a category of operational problems. Together, they enable minimal ops and unlimited scale.
Why the Kafka protocol leaks topology
Traditional Kafka clients do not just connect to a cluster. They discover it.
When a client connects, it sends a Metadata request. The broker responds with a list of all brokers in the cluster and which broker leads each partition. The client then opens direct connections to each broker it needs.
This design made sense when brokers were stable, long-lived servers. It becomes a liability when brokers are ephemeral pods.
Every broker restart can break client connections. Every scaling event requires clients to rediscover the cluster. DNS and load balancers cannot fully abstract the topology because the protocol itself exposes it.
KafScale’s proxy solves this by intercepting Metadata and FindCoordinator responses, substituting its own address. Clients believe they are talking to a single broker. The proxy routes requests to the actual brokers internally.
The result:
- Add brokers without client awareness
- Rotate brokers during deployments without connection drops
- Use standard Kubernetes networking patterns
- One ingress, one DNS name, standard TLS termination
One endpoint. Infinite scale behind it.
Why processors should bypass brokers
Traditional Kafka architectures force all reads through brokers. Streaming consumers and batch analytics compete for the same resources. Backfills spike broker CPU. Training jobs block production consumers.
KafScale separates these concerns by design.
Brokers handle the Kafka protocol: accepting writes from producers, serving reads to streaming consumers, managing consumer groups. Processors read completed segments directly from S3, bypassing brokers entirely.
This separation has practical consequences:
- Historical replays do not affect streaming latency
- AI workloads get complete context without degrading production
- Iceberg materialization scales independently from Kafka consumers
- Processor development does not require broker changes
The architecture enables use cases that broker-mediated systems cannot serve efficiently.
Why AI agents need this architecture
AI agents making decisions need context. That context comes from historical events: what happened, in what order, and why the current state exists.
Traditional stream processing optimizes for latency. Milliseconds matter for fraud detection or trading systems. But AI agents reasoning over business context have different requirements. They need completeness. They need replay capability. They need to reconcile current state with historical actions.
Storage-native streaming makes this practical. The immutable log in S3 becomes the source of truth that agents query, replay, and reason over. Processors convert that log to tables that analytical tools understand. Agents get complete historical context without competing with streaming workloads for broker resources.
Two-second latency for analytical access is acceptable when the alternative is incomplete context or degraded streaming performance. AI agents do not need sub-millisecond reads. They need the full picture.
What KafScale deliberately does not do
KafScale is not trying to replace every Kafka deployment.
It deliberately does not target:
- Sub-10ms end-to-end latency workloads
- Exactly-once transactions across topics
- Compacted topics
- Embedded stream processing inside the broker
Those features increase broker statefulness and operational complexity. For many workloads, they are unnecessary.
KafScale focuses on the common case: durable message transport, replayability, predictable retention, and low operational overhead.
Summary
Storing Kafka data in S3 is not the innovation. What matters is how much complexity remains after you do it.
KafScale removes the topology coupling that breaks clients during scaling. It removes the compute/storage coupling that makes recovery slow. It removes the streaming/analytics coupling that forces workloads to compete. It removes the format/vendor coupling that creates dependency.
The result is minimal ops and unlimited scale. That is the point.
Further reading
- Architecture for detailed component diagrams and data flows
- Comparison for how KafScale compares to alternatives