Architecture

KafScale brokers are stateless pods on Kubernetes. Metadata lives in etcd, while immutable log segments live in S3. Clients speak the Kafka protocol to a proxy that abstracts broker topology. Brokers flush segments to S3 and serve reads with caching.

Platform overview

Architecture overview

How the proxy works

The Kafka protocol requires clients to discover broker topology. When a client connects, the broker returns a list of all brokers and their partition assignments. Clients then connect directly to each broker they need.

This creates a problem for ephemeral infrastructure. Every broker restart breaks client connections. Scaling events require clients to rediscover the cluster.

KafScale’s proxy solves this by intercepting two types of responses:

Request	What the proxy does
Metadata	Returns the proxy’s own address instead of individual broker addresses
FindCoordinator	Returns the proxy’s address for consumer group coordination

Clients believe they are talking to a single broker. The proxy routes requests to the actual brokers internally.

This enables:

Infinite horizontal scaling: Add brokers without client awareness
Zero-downtime deployments: Rotate broker pods behind the proxy
Standard networking: One LoadBalancer, one DNS name, standard TLS termination

For configuration details, see Operations: External Broker Access.

Decoupled processing (addons)

KafScale keeps brokers focused on Kafka protocol and storage. Add-on processors handle downstream tasks by reading completed segments directly from S3, bypassing brokers entirely. Processors are stateless: offsets and leases live in etcd, input lives in S3, output goes to external catalogs.

Data processor architecture

The processor reads .kfs segments from S3, tracks progress in etcd, and writes Parquet files to an Iceberg warehouse. Any Iceberg-compatible catalog can serve the tables to downstream consumers.

For deployment and configuration, see the Iceberg Processor docs.

Key design decisions

Decision	Rationale
Proxy for topology abstraction	Clients see one endpoint. Brokers scale without client awareness.
S3 as source of truth	11 nines durability, unlimited capacity, ~$0.023/GB/month
Stateless brokers	Any pod serves any partition. HPA scales 0→N instantly.
etcd for metadata	Leverages existing K8s patterns. Strong consistency.
~500ms latency	Acceptable trade-off for ETL, logs, async events
No transactions	Simplifies architecture. Covers 80% of Kafka use cases.
4MB segment size	Balances S3 PUT costs (~$0.005/1000) vs flush latency

Produce flow

Write path

Produce: Client sends records to any broker via Kafka protocol
Batch: Broker validates, batches records, assigns offsets
Flush: When buffer reaches 4MB or 500ms, segment is sealed and uploaded to S3

Data is not acknowledged until S3 upload completes. This guarantees 11 nines durability on ACK.

Fetch flow

Read path

Fetch: Consumer requests data from broker
Cache check: Broker looks up segment in LRU cache
S3 fetch: On cache miss, broker fetches from S3
Populate: Fetched segment is cached for future requests
Return: Data returned to consumer

Component responsibilities

Component	Responsibilities
Proxy	Rewrites Metadata/FindCoordinator responses, routes requests to brokers, enables topology abstraction
Broker	Kafka protocol, batching, offset assignment, S3 read/write, caching
etcd	Topic metadata, consumer offsets, group assignments, leader election
S3	Durable segment storage, source of truth, lifecycle-based retention
Operator	CRD reconciliation, etcd snapshots, broker lifecycle management

Segment format summary

Segments are self-contained files with header, Kafka-compatible record batches, and footer.

Field	Size	Description
Magic	4 bytes	`0x4B414653` (“KAFS”)
Version	2 bytes	Format version (1)
Flags	2 bytes	Compression codec
Base Offset	8 bytes	First offset in segment
Message Count	4 bytes	Number of messages
Timestamp	8 bytes	Created (Unix ms)
Batches	variable	Kafka RecordBatch format
CRC32	4 bytes	Checksum
Footer Magic	4 bytes	`0x454E4421` (“END!”)

See Storage Format for complete details on segment structure, indexes, and S3 key layout.

Next steps

Operations for proxy configuration and S3 health states
Storage Format for detailed segment and index layouts
Rationale for why we made these design choices