Like what you see? ⭐ Star the repo ⭐ to support the project and keep it in the spotlight. See the stargazers →

Benchmarking

This document captures repeatable benchmark scenarios for Kafscale. It focuses on end-to-end throughput and latency across the broker and S3-backed storage.

The local runner lives in docs/benchmarks/scripts/run_local.sh and can be used to reproduce the baseline numbers on a demo platform setup. A tiny Makefile wrapper is available at docs/benchmarks/Makefile.

The runner uses dedicated topics (bench-hot, bench-s3, bench-multi) and will create them if kafka-topics.sh is available. Otherwise, it falls back to auto-creation behavior in the demo stack.

Goals

  • Quantify produce/consume throughput on the broker path.
  • Quantify consume performance when reading from S3-backed segments.
  • Capture tail latency and error rates under steady load.

General Guidance

  • Stop the demo workload before benchmarking to avoid noise.
  • Keep topic/partition counts fixed for each run.
  • Record system resources (CPU, memory, disk, network) alongside metrics.
  • Use consistent message size and producer acks across runs.
Throughput by Scenario Local demo on kind · 256B messages · kcat client Produce (cross-partition) 7,500 msg/s Produce (single topic) 4,115 msg/s Consume (cross-partition) 1,629 msg/s Consume (cold read) 1,388 msg/s Consume (S3 backlog) 1,117 msg/s Produce Consume Cold read = after broker restart

Scenarios

1) Produce and Consume via Broker

This is the hot path: produce to the broker and consume immediately.

What to measure

  • Produce throughput (msg/s and MB/s).
  • Consume throughput (msg/s and MB/s).
  • p50/p95/p99 produce and fetch latency.
  • Fetch/produce error rates.

Notes

  • Use a single consumer group and fixed partition count.
  • Keep message size constant (for example 1 KB, 10 KB, 100 KB).

2) Produce via Broker, Consume from S3

This validates catch-up and deep reads from S3-backed segments.

What to measure

  • Catch-up rate when consuming from earliest offsets.
  • Steady-state fetch latency when reading older segments.
  • S3 request latency and error rate.

Notes

  • Preload data (produce a backlog), then consume from -o beginning.
  • Record the time to drain N records or M bytes.

Optional Scenarios

  • Backlog catch-up: produce a large backlog, then measure how quickly a new consumer group catches up to head.
  • Partition scale: repeat scenarios with 1, 3, 6, 12 partitions.
  • Message size sweep: 1 KB, 10 KB, 100 KB to characterize bandwidth.
  • Multi-consumer fan-out: multiple consumer groups reading the same topic.
  • Offset recovery: restart consumer and measure resume time.

Metrics to Capture

  • Broker metrics (/metrics): kafscale_produce_rps, kafscale_fetch_rps, kafscale_s3_latency_ms_avg, kafscale_s3_error_rate.
  • Client-side latency distribution (p50/p95/p99).
  • End-to-end throughput (msg/s and MB/s).

Reporting Template

Record each run with:

  • Scenario:
  • Topic/Partitions:
  • Message Size:
  • Producer Acks:
  • Produce Throughput:
  • Consume Throughput:
  • Latency p95/p99:
  • S3 Health:
  • Notes:

Local Run (2026-01-03)

Local demo setup on kind (demo workload stopped). Commands and results below document the baseline numbers we observed on a laptop.

Broker Produce + Consume (Hot Path)

Command

time env TOPIC=bench-hot N=2000 SIZE=256 sh -c 'kcat -C -b 127.0.0.1:39092 -t "$TOPIC" -o end -c "$N" -e -q >/tmp/bench-consume-broker.log & cons=$!; sleep 1; python3 - <<\"PY\" | kcat -P -b 127.0.0.1:39092 -t "$TOPIC" -q
import os,sys
n=int(os.environ["N"])
size=int(os.environ["SIZE"])
line=("x"*(size-1))+"\\n"
for _ in range(n):
    sys.stdout.write(line)
PY
wait $cons'

Output

  • 2000 messages, 256B payload, total wall time ~3.617s
  • End-to-end throughput: ~553 msg/s

Produce Backlog, Consume from S3

Produce backlog

time env TOPIC=bench-s3 N=2000 SIZE=256 sh -c 'python3 - <<\"PY\" | kcat -P -b 127.0.0.1:39092 -t "$TOPIC" -q
import os,sys
n=int(os.environ["N"])
size=int(os.environ["SIZE"])
line=("x"*(size-1))+"\\n"
for _ in range(n):
    sys.stdout.write(line)
PY'

Consume from beginning

time kcat -C -b 127.0.0.1:39092 -t bench-s3 -o beginning -c 2000 -e -q

Output

  • Produce backlog: ~0.486s for 2000 messages (~4115 msg/s)
  • Consume from beginning: ~1.79s for 2000 messages (~1117 msg/s)

Metrics Snapshot

Record a metrics snapshot during each run:

curl -s http://127.0.0.1:9093/metrics | rg 'kafscale_(produce|fetch)_rps|kafscale_s3_latency_ms_avg|kafscale_s3_error_rate'

Cross-Partition Fan-Out (All Partitions)

Produce backlog

time env TOPIC=bench-hot N=4000 SIZE=256 sh -c 'python3 - <<\"PY\" | kcat -P -b 127.0.0.1:39092 -t "$TOPIC" -q
import os,sys
n=int(os.environ["N"])
size=int(os.environ["SIZE"])
line=("x"*(size-1))+"\\n"
for _ in range(n):
    sys.stdout.write(line)
PY'

Consume from beginning

time kcat -C -b 127.0.0.1:39092 -t bench-hot -o beginning -c 4000 -e -q

Output

  • Produce backlog: ~0.533s for 4000 messages (~7500 msg/s)
  • Consume from beginning: ~2.455s for 4000 messages (~1629 msg/s)

S3 Cold Read (After Broker Restart)

Produce backlog

time env TOPIC=bench-s3 N=2000 SIZE=256 sh -c 'python3 - <<\"PY\" | kcat -P -b 127.0.0.1:39092 -t "$TOPIC" -q
import os,sys
n=int(os.environ["N"])
size=int(os.environ["SIZE"])
line=("x"*(size-1))+"\\n"
for _ in range(n):
    sys.stdout.write(line)
PY'

Restart brokers

kubectl -n kafscale-demo rollout restart statefulset/kafscale-broker
kubectl -n kafscale-demo rollout status statefulset/kafscale-broker --timeout=120s

Consume from beginning

time kcat -C -b 127.0.0.1:39092 -t bench-s3 -o beginning -c 2000 -e -q

Output

  • Produce backlog: ~0.929s for 2000 messages (~2152 msg/s)
  • Cold consume: ~1.441s for 2000 messages (~1388 msg/s)

Large Records (Attempted)

Open

Multi-Topic Mixed Load (Attempted)

Open