Storage Format

KafScale stores all message data in S3 as immutable segment files. This page covers the binary formats, caching strategy, and retention configuration.

S3 key layout

s3://{bucket}/{namespace}/{topic}/{partition}/segment-{base_offset}.kfs
s3://{bucket}/{namespace}/{topic}/{partition}/segment-{base_offset}.index

Example:

s3://kafscale-data/production/orders/0/segment-00000000000000000000.kfs
s3://kafscale-data/production/orders/0/segment-00000000000000000000.index

The 20-digit zero-padded offset ensures lexicographic sorting matches offset order.

Segment file format

Each .kfs segment is a self-contained file with header, message batches, and footer.

Segment header (32 bytes)

Magic number4 bytes0x4B414653 ("KAFS")

Version2 bytesFormat version (1)

Flags2 bytesCompression codec, etc.

Base offset8 bytesFirst offset in segment

Message count4 bytesNumber of messages

Created timestamp8 bytesUnix milliseconds

Reserved4 bytesFuture use

Segment body (variable)

Message batch 1variableKafka RecordBatch format

Message batch 2variableKafka RecordBatch format

...More batches until segment sealed

CRC324 bytesChecksum of all batches

Last offset8 bytesLast offset in segment

Footer magic4 bytes0x454E4421 ("END!")

Message batch format

Batches are Kafka-compatible (magic byte 2) for client interoperability.

Batch header (49 bytes)

Base offset8 bytesFirst offset in batch

Batch length4 bytesTotal bytes in batch

Partition leader epoch4 bytesLeader epoch

Magic1 byte2 (Kafka v2 format)

CRC324 bytesChecksum of batch

Attributes2 bytesCompression, timestamp type

Last offset delta4 bytesLast record offset - base

First timestamp8 bytesTimestamp of first record

Max timestamp8 bytesMax timestamp in batch

Producer ID8 bytes-1 (no idempotence)

Producer epoch2 bytes-1

Base sequence4 bytes-1

Record count4 bytesNumber of records

Individual record format

Each record within a batch uses varint encoding for compactness.

LengthvarintTotal record size

Attributes1 byteUnused (0)

Timestamp deltavarintDelta from batch first timestamp

Offset deltavarintDelta from batch base offset

Key lengthvarint-1 for null, else byte count

KeybytesMessage key (optional)

Value lengthvarintMessage value byte count

ValuebytesMessage payload

Headers countvarintNumber of headers

HeadersbytesKey-value header pairs

Index file format

Sparse index for fast offset-to-position lookups. One entry per N messages.

Index header (16 bytes)

Magic4 bytes0x4944580A ("IDX\n")

Version2 bytes1

Entry count4 bytesNumber of index entries

Interval4 bytesMessages between entries

Reserved2 bytesFuture use

Index entries (12 bytes each)

Offset8 bytesMessage offset

Position4 bytesByte position in segment file

To locate offset N: binary search index entries, then scan forward from nearest position.

Cache architecture

Multi-layer cache

Cache configuration

Variable	Default	Description
`KAFSCALE_CACHE_BYTES`	`1073741824`	L1 hot segment cache size (1GB)
`KAFSCALE_INDEX_CACHE_BYTES`	`104857600`	L2 index cache size (100MB)
`KAFSCALE_READAHEAD_SEGMENTS`	`2`	Segments to prefetch

Flush triggers

Segments are sealed and flushed to S3 when any condition is met:

Buffer size threshold 4 MB KAFSCALE_SEGMENT_BYTES

Time since last flush 500 ms KAFSCALE_FLUSH_INTERVAL_MS

Explicit flush request — Admin API or graceful shutdown

Flush sequence

Seal current buffer (no more writes accepted)
Compress batches (Snappy by default)
Build sparse index file
Upload segment + index to S3 (both must succeed)
Update etcd with new segment metadata
Ack waiting producers (if acks=all)
Clear flushed data from buffer

S3 lifecycle configuration

Use bucket lifecycle rules to automatically expire old segments. Align with your topic retention settings.

Example: 7-day retention

{
  "Rules": [
    {
      "ID": "kafscale-retention-7d",
      "Filter": {
        "Prefix": "production/"
      },
      "Status": "Enabled",
      "Expiration": {
        "Days": 7
      }
    }
  ]
}

AWS CLI setup

aws s3api put-bucket-lifecycle-configuration \
  --bucket kafscale-data \
  --lifecycle-configuration file://lifecycle.json

Terraform example

resource "aws_s3_bucket_lifecycle_configuration" "kafscale" {
  bucket = aws_s3_bucket.kafscale_data.id

  rule {
    id     = "kafscale-retention"
    status = "Enabled"

    filter {
      prefix = "production/"
    }

    expiration {
      days = 7
    }
  }
}

Per-topic retention

For different retention per topic, use prefix-based rules:

{
  "Rules": [
    {
      "ID": "logs-1d",
      "Filter": { "Prefix": "production/logs/" },
      "Status": "Enabled",
      "Expiration": { "Days": 1 }
    },
    {
      "ID": "events-30d",
      "Filter": { "Prefix": "production/events/" },
      "Status": "Enabled",
      "Expiration": { "Days": 30 }
    },
    {
      "ID": "default-7d",
      "Filter": { "Prefix": "production/" },
      "Status": "Enabled",
      "Expiration": { "Days": 7 }
    }
  ]
}

Rules are evaluated in order; most specific prefix wins.

Compression

KafScale supports batch-level compression using Kafka-compatible codecs.

Codec	ID	Notes
None	0	No compression
Snappy	1	Default — fast, moderate ratio
LZ4	3	Faster decompression
ZSTD	4	Best ratio, slower

Set per-topic in CRD:

apiVersion: kafscale.io/v1alpha1
kind: KafScaleTopic
metadata:
  name: logs
  namespace: kafscale
spec:
  clusterRef: demo
  partitions: 6
  config:
    retention.ms: "86400000"
    compression.type: "zstd"