FAQ

General

How does KafScale compare to WarpStream, Redpanda, or AutoMQ?

See Comparison for a detailed side-by-side analysis covering architecture, latency, licensing, and cost.

The short version: KafScale is the only S3-native, stateless Kafka-compatible platform under the Apache 2.0 license. WarpStream is now Confluent-owned (proprietary), AutoMQ uses BSL licensing, and Redpanda requires local disks.

Why would I use KafScale instead of Apache Kafka?

KafScale trades latency for operational simplicity and storage-native processing. If your workload can tolerate hundreds of milliseconds of latency, KafScale eliminates stateful brokers, partition rebalancing, and disk capacity planning.

For AI agent infrastructure: KafScale’s architecture aligns with what agentic systems actually need. AI agents reasoning over business context require completeness and replay capability, not sub-millisecond latency. The immutable log in S3 becomes the system of record that agents query, replay, and reason over. Processors convert that log to tables without competing with streaming workloads for broker resources.

Traditional stream processing optimizes for latency. Milliseconds matter for fraud detection or trading. But AI agents have different requirements: they need to understand what happened, in what order, and why the current state exists. Event sourcing research from the Apache Flink community (FLIP-531) and platforms like Akka confirms this pattern: agentic systems need reproducible state at any point in time.

Is KafScale production ready?

KafScale is designed for production use, but comes with no warranties or guarantees. Review Operations and Security to align it with your requirements. Start with non-critical workloads and expand as you gain confidence.

What license is KafScale released under?

Apache 2.0. You can use it commercially, modify it, distribute it, and offer it as a service without restrictions. No BSL conversion periods, no usage fees, no control plane dependencies.

Architecture

Why does KafScale use native Kafka record format?

KafScale stores data in .kfs segments containing native Kafka V2 record batches, the same binary format Kafka uses internally. This is a deliberate choice:

Format stability: Kafka’s on-disk format is one of the most stable interfaces in data infrastructure. In 15+ years, there have been exactly three message format versions:

Version	Introduced	Status
V0	Original (2011)	Removed in Kafka 4.0
V1	Kafka 0.10.0 (2016)	Removed in Kafka 4.0
V2	Kafka 0.11.0 (June 2017)	Current standard

V2 has been the only supported format for 8+ years. The entire Kafka ecosystem (Confluent, Redpanda, every client library, Flink, Spark, Debezium, MirrorMaker) depends on this stability. Changing it would break everything.

If Kafka ever changes: KafScale is fully open source under Apache 2.0. Any format updates can be implemented immediately by the community. Contrast this with proprietary alternatives where you’d wait for a vendor to prioritize the update.

No abstraction tax: Using native format means zero conversion overhead. Producers write Kafka records; we store Kafka records; consumers read Kafka records.

What about coupling processors to the storage format?

Processors read directly from S3, bypassing brokers entirely. This means they understand the .kfs segment format and coordinate via etcd.

This is intentional coupling to a stable interface, not a liability:

The format won’t change: Kafka V2 record batches are a de facto standard
Read-replica brokers would have the same coupling: they’d also need to parse segments and query etcd
The coupling is explicit and documented: not hidden inside a proprietary broker
Open format means open tooling: anyone can build processors, analyzers, or integrations

The tradeoff: if KafScale’s internal segment layout evolves, processors need updates. In practice, we version the segment format and maintain backward compatibility.

Does KafScale work with clouds other than AWS?

Yes. KafScale works with any S3-compatible storage backend. See Storage Compatibility for configuration examples.

Provider	Compatibility	Notes
AWS S3	✅ Native	Full support including IRSA
DigitalOcean Spaces	✅ Native	Drop-in replacement
Cloudflare R2	✅ Native	Zero egress fees
Backblaze B2	✅ Native	S3-compatible API
MinIO	✅ Native	Self-hosted, any infrastructure
Google Cloud Storage	⚠️ Interop	Requires HMAC keys
Azure Blob Storage	❌ Proxy	Requires MinIO Gateway

Latency and Performance

What latency should I expect?

KafScale prioritizes durability and operational simplicity over sub-10ms latency. Typical latencies:

Operation	p50	p99	Notes
Produce	200-300ms	400-500ms	Depends on flush interval and S3 region
Fetch (cache hit)	1-5ms	10ms	Hot segment cache
Fetch (cache miss)	50-100ms	150ms	S3 GetObject
Consumer group join	100-200ms	500ms	etcd coordination

Can I reduce latency?

Several factors affect latency:

S3 region proximity: Deploy brokers in the same region as your S3 bucket
Flush interval: Lower KAFSCALE_FLUSH_INTERVAL_MS reduces produce latency but increases S3 requests
Cache size: Larger KAFSCALE_CACHE_SIZE improves fetch hit rates
Segment size: Smaller KAFSCALE_SEGMENT_BYTES flushes more frequently

The fundamental tradeoff is S3 round-trip time. If you need sub-50ms latency, KafScale is not the right choice.

How large can messages be?

There’s no configured maximum in KafScale. The theoretical limit is the 32-bit Kafka frame length (~2 GB), but the practical limit is broker memory since messages are fully buffered in RAM before flushing to S3.

Rule of thumb: max in-flight messages ≈ RAM / (2 × message size)

Broker RAM	10 MB messages	50 MB messages	100 MB messages
16 GB	~800 in-flight	~160 in-flight	~80 in-flight

Large payloads (XML, JSON, binary blobs) work fine. They’re stored as standard Kafka record batches inside .kfs segments. Multiple messages buffer until flush thresholds, then upload as a single segment object.

Scaling for large messages: Since brokers are stateless, you can scale pods automatically based on memory pressure (e.g., HPA at 80% memory). One stable endpoint, automatic scaling behind the scenes.

How does KafScale handle backpressure?

When S3 latency exceeds thresholds, brokers enter DEGRADED state. If S3 becomes unavailable, brokers enter UNAVAILABLE state and reject produce requests while continuing to serve cached fetch requests. Clients should implement retry logic with exponential backoff.

Kafka Compatibility

Can I use existing Kafka clients?

Yes. KafScale implements the Kafka wire protocol for core APIs. Any client that speaks Kafka protocol works without modification.

Tested clients include kafka-python, franz-go, librdkafka, Sarama, and the official Java client.

Which Kafka APIs are supported?

KafScale supports 21 Kafka APIs covering produce, fetch, metadata, and consumer group operations. See Protocol for the complete compatibility matrix.

Not supported: transactions (exactly-once semantics), compacted topics, and the admin API for ACLs.

Can I migrate from Kafka to KafScale?

Yes, but it requires replaying data. KafScale uses a different storage layout (S3 segments) than Kafka (local log files), though the record format is identical. Migration options:

Dual-write: Produce to both systems during transition
MirrorMaker: Use Kafka MirrorMaker to replicate topics to KafScale
Consumer replay: Consume from Kafka and produce to KafScale

Do consumer groups work?

Yes. KafScale implements the full consumer group protocol including JoinGroup, SyncGroup, Heartbeat, LeaveGroup, and OffsetCommit/Fetch. Consumer offsets are stored in etcd.

Processors

What are processors?

Processors are components that read directly from S3, bypassing brokers entirely. They enable analytical workloads without adding load to your streaming infrastructure. See Processors for details.

Available processors:

Iceberg Processor: Continuous export to Apache Iceberg tables
SQL Processor (KAFSQL): Query KafScale segments with Postgres-compatible SQL

Why bypass brokers for analytics?

Traditional Kafka Connect runs through brokers, competing with real-time consumers for broker resources. KafScale processors read segments directly from S3:

No broker contention: Analytical queries don’t impact streaming latency
Horizontal scale: Add processors without broker capacity planning
Cost efficiency: S3 reads are cheap; broker CPU is expensive

This architecture is ideal for AI/ML pipelines where you need to replay large volumes of historical data without impacting production consumers.

Can I build custom processors?

Yes. The .kfs segment format is documented, and processors coordinate via etcd for offset tracking. See Building Processors for the SDK and examples.

Storage and Durability

How durable is my data?

S3 provides 99.999999999% (11 nines) durability. Once data is acknowledged to the producer, it exists in S3 with the same durability guarantees as any S3 object.

What happens if S3 goes down?

Brokers monitor S3 health continuously. Based on error rates and latency:

State	Condition	Behavior
Healthy	Error rate < 1%, latency < 500ms	Normal operation
Degraded	Error rate 1-5% or latency 500-2000ms	Accepts requests with warnings
Unavailable	Error rate > 5% or latency > 2000ms	Rejects produces, serves cached fetches

Monitor kafscale_s3_health_state (0=healthy, 1=degraded, 2=unavailable) and implement client-side retries.

What happens if a broker crashes?

Nothing is lost. Brokers are stateless. All data lives in S3, all metadata lives in etcd. When a broker restarts (or a new pod schedules), it reads state from etcd and resumes serving requests. No partition rebalancing required.

How do I set retention?

KafScale uses S3 lifecycle policies for retention. Configure via AWS console, CLI, or Terraform:

{
  "Rules": [{
    "ID": "kafscale-retention",
    "Status": "Enabled",
    "Filter": { "Prefix": "kafscale/" },
    "Expiration": { "Days": 7 }
  }]
}

Per-topic retention is possible using prefix-based rules (e.g., kafscale/default/orders/).

Operations

How do I scale KafScale?

Horizontally. Add more broker replicas. Since brokers are stateless and S3 is the source of truth, there’s no partition rebalancing or data migration. New brokers immediately start serving requests.

kubectl scale deployment demo-broker --replicas=5

Or use HPA for automatic scaling based on CPU or custom metrics.

What do I need to back up?

Only etcd. Broker state is ephemeral. S3 data is durable by default. etcd stores topic metadata, consumer offsets, and cluster configuration.

The operator can automate etcd snapshots to S3:

spec:
  etcd:
    backup:
      enabled: true
      bucket: kafscale-backups
      interval: 1h

How do I monitor KafScale?

Brokers expose Prometheus metrics on port 9093. Key metrics:

kafscale_s3_health_state: S3 availability (0/1/2)
kafscale_s3_latency_ms_avg: S3 operation latency
kafscale_produce_rps: Produce throughput
kafscale_fetch_rps: Fetch throughput
kafscale_consumer_group_lag: Consumer lag by group

See Metrics for the complete reference.

Can I run KafScale outside Kubernetes?

The operator and CRDs are Kubernetes-native, but the broker binary can run standalone. You’ll need to manage etcd and configuration yourself. See Development for running locally with Docker Compose.

Security

Does KafScale support TLS?

Yes. Configure TLS for client connections and inter-broker communication via the CRD:

spec:
  tls:
    enabled: true
    secretRef: kafscale-tls

The secret should contain tls.crt and tls.key.

Does KafScale support authentication?

SASL/PLAIN and SASL/SCRAM are on the roadmap. Currently, network-level security (Kubernetes NetworkPolicies, service mesh) is recommended.

Is data encrypted at rest?

Use S3 server-side encryption (SSE-S3 or SSE-KMS). KafScale writes standard S3 objects, so all S3 encryption options apply.

Troubleshooting

Brokers won’t start

Check etcd connectivity and S3 credentials:

kubectl logs -n kafscale deployment/demo-broker
kubectl get secret kafscale-s3 -o yaml

Common issues: wrong etcd endpoints, expired AWS credentials, S3 bucket doesn’t exist.

High produce latency

Check S3 latency and broker resources:

kubectl exec -n kafscale deployment/demo-broker -- curl localhost:9093/metrics | grep s3_latency
kubectl top pods -n kafscale

If S3 latency is high, verify the bucket is in the same region as your cluster.

Consumer group rebalancing constantly

Check session timeout and network stability:

kubectl logs -n kafscale deployment/demo-broker | grep -i rebalance

Increase session.timeout.ms on clients if pods are slow to respond to heartbeats.

Contributing

How can I contribute?

See CONTRIBUTING.md in the repository. We welcome bug reports, feature requests, documentation improvements, and code contributions.

Where do I report bugs?

Open an issue on GitHub. Include KafScale version, Kubernetes version, and relevant logs.

Is there a community?

Join the discussion on GitHub Discussions or the #kafscale channel on the Kubernetes Slack.