Architecture

KafScale brokers are stateless pods on Kubernetes. Metadata lives in etcd, while immutable log segments live in S3. Clients speak the Kafka protocol to brokers; brokers flush segments to S3 and serve reads with caching.

Platform overview

Architecture overview
Kafka Clients producers & consumers KUBERNETES CLUSTER Broker 0 stateless · Go Broker 1 stateless · Go Broker 2 stateless · Go ← HPA etcd (3 nodes) topics · offsets · assignments S3 Data Bucket S3 Backup etcd snapshots :9092 flush segments fetch + cache metadata snapshots S3 is the source of truth · Brokers are stateless · etcd for coordination

Produce flow

Write path
Producer Kafka client Broker validate · batch assign offsets Buffer in-memory batches S3 sealed segment 1 2 3 produce batch flush

Producers send records to brokers. Brokers validate, batch, and assign offsets. When the buffer reaches 4MB or 500ms, it’s sealed and flushed to S3 as an immutable segment.

Fetch flow

Read path
Consumer Kafka client Broker locate segment check cache LRU Cache hit → fast path S3 miss → fetch 1 2 3 4 5 fetch cache? miss

Consumers request data from brokers. Brokers resolve the segment offset, check the LRU cache, and fetch from S3 on cache miss. Read-ahead prefetches likely-needed segments.

Segment format

Field Size Description
Magic Number 4 bytes 0x4B414653 ("KAFS")
Version 2 bytes Format version (currently 1)
Flags 2 bytes Compression codec, etc.
Base Offset 8 bytes First offset in segment
Message Count 4 bytes Number of messages
Created Timestamp 8 bytes Unix milliseconds
Message Batches variable Kafka-compatible RecordBatch format
CRC32 4 bytes Checksum of all batches
Footer Magic 4 bytes 0x454E4421 ("END!")

Key design decisions

Decision Rationale
S3 as source of truth 11 9’s durability, infinite capacity, $0.023/GB
Stateless brokers Any pod serves any partition; HPA scales 0→N
etcd for metadata Leverages existing K8s etcd or dedicated cluster
~500ms latency Acceptable trade-off for ETL, logs, async events
No transactions Simplifies architecture for 80% use case
4MB segments Balances S3 PUT costs vs. flush latency