Architecture
KafScale brokers are stateless pods on Kubernetes. Metadata lives in etcd, while immutable log segments live in S3. Clients speak the Kafka protocol to a proxy that abstracts broker topology. Brokers flush segments to S3 and serve reads with caching.
Platform overview
How the proxy works
The Kafka protocol requires clients to discover broker topology. When a client connects, the broker returns a list of all brokers and their partition assignments. Clients then connect directly to each broker they need.
This creates a problem for ephemeral infrastructure. Every broker restart breaks client connections. Scaling events require clients to rediscover the cluster.
KafScale’s proxy solves this by intercepting two types of responses:
| Request | What the proxy does |
|---|---|
| Metadata | Returns the proxy’s own address instead of individual broker addresses |
| FindCoordinator | Returns the proxy’s address for consumer group coordination |
Clients believe they are talking to a single broker. The proxy routes requests to the actual brokers internally.
This enables:
- Infinite horizontal scaling: Add brokers without client awareness
- Zero-downtime deployments: Rotate broker pods behind the proxy
- Standard networking: One LoadBalancer, one DNS name, standard TLS termination
For configuration details, see Operations: External Broker Access.
Decoupled processing (addons)
KafScale keeps brokers focused on Kafka protocol and storage. Add-on processors handle downstream tasks by reading completed segments directly from S3, bypassing brokers entirely. Processors are stateless: offsets and leases live in etcd, input lives in S3, output goes to external catalogs.
The processor reads .kfs segments from S3, tracks progress in etcd, and writes Parquet files to an Iceberg warehouse. Any Iceberg-compatible catalog can serve the tables to downstream consumers.
For deployment and configuration, see the Iceberg Processor docs.
Key design decisions
| Decision | Rationale |
|---|---|
| Proxy for topology abstraction | Clients see one endpoint. Brokers scale without client awareness. |
| S3 as source of truth | 11 nines durability, unlimited capacity, ~$0.023/GB/month |
| Stateless brokers | Any pod serves any partition. HPA scales 0→N instantly. |
| etcd for metadata | Leverages existing K8s patterns. Strong consistency. |
| ~500ms latency | Acceptable trade-off for ETL, logs, async events |
| No transactions | Simplifies architecture. Covers 80% of Kafka use cases. |
| 4MB segment size | Balances S3 PUT costs (~$0.005/1000) vs flush latency |
Produce flow
- Produce: Client sends records to any broker via Kafka protocol
- Batch: Broker validates, batches records, assigns offsets
- Flush: When buffer reaches 4MB or 500ms, segment is sealed and uploaded to S3
Data is not acknowledged until S3 upload completes. This guarantees 11 nines durability on ACK.
Fetch flow
- Fetch: Consumer requests data from broker
- Cache check: Broker looks up segment in LRU cache
- S3 fetch: On cache miss, broker fetches from S3
- Populate: Fetched segment is cached for future requests
- Return: Data returned to consumer
Component responsibilities
| Component | Responsibilities |
|---|---|
| Proxy | Rewrites Metadata/FindCoordinator responses, routes requests to brokers, enables topology abstraction |
| Broker | Kafka protocol, batching, offset assignment, S3 read/write, caching |
| etcd | Topic metadata, consumer offsets, group assignments, leader election |
| S3 | Durable segment storage, source of truth, lifecycle-based retention |
| Operator | CRD reconciliation, etcd snapshots, broker lifecycle management |
Segment format summary
Segments are self-contained files with header, Kafka-compatible record batches, and footer.
| Field | Size | Description |
|---|---|---|
| Magic | 4 bytes | 0x4B414653 (“KAFS”) |
| Version | 2 bytes | Format version (1) |
| Flags | 2 bytes | Compression codec |
| Base Offset | 8 bytes | First offset in segment |
| Message Count | 4 bytes | Number of messages |
| Timestamp | 8 bytes | Created (Unix ms) |
| Batches | variable | Kafka RecordBatch format |
| CRC32 | 4 bytes | Checksum |
| Footer Magic | 4 bytes | 0x454E4421 (“END!”) |
See Storage Format for complete details on segment structure, indexes, and S3 key layout.
Next steps
- Operations for proxy configuration and S3 health states
- Storage Format for detailed segment and index layouts
- Rationale for why we made these design choices