Performance

Comprehensive performance analysis, benchmarks, and optimization strategies for the Zentalk protocol.

Performance Philosophy

Zentalk prioritizes security and privacy over raw performance, but extensive optimizations ensure the system remains highly responsive for end users.

Design Goals

Priority	Goal	Tradeoff
1	Security	Never compromise cryptographic operations for speed
2	Privacy	Metadata protection worth additional latency
3	Reliability	Redundancy over minimal resource usage
4	Responsiveness	User-perceived latency optimization
5	Efficiency	Resource optimization where possible

Performance vs Privacy Tradeoffs

Feature	Performance Impact	Privacy Benefit
3-hop relay routing	+150-400ms latency	Hides sender/recipient
Traffic padding	+20% bandwidth	Prevents traffic analysis
Stealth addresses	Scanning overhead	Unlinkable payments
Double Ratchet	Per-message crypto ops	Forward secrecy
DHT lookup	O(log n) hops	Decentralized, censorship resistant

Latency Characteristics

End-to-End Message Latency

Measured latency from send button press to recipient notification:

Percentile	Cold Start	Warm (Circuit Ready)	Notes
p50	1.2s	350ms	Typical conditions
p75	1.8s	520ms	Moderate network
p90	2.5s	850ms	Congested network
p95	3.2s	1.1s	Poor conditions
p99	5.5s	2.2s	Worst case

Cold start includes circuit building. Warm assumes established circuit.

Latency Breakdown by Component

Component	Duration (p50)	Duration (p99)	Notes
Client-Side Encryption
Double Ratchet key derivation	0.5ms	2ms	HKDF operations
AES-256-GCM encryption	0.3ms	1ms	Per message
Ed25519 signature	0.1ms	0.3ms	Message authentication
Network Transport
Circuit build (if needed)	800ms	2.5s	3 TLS handshakes
3-hop relay routing	150ms	500ms	Sequential relay
DHT lookup	200ms	800ms	O(log n) hops
Mesh storage	50ms	200ms	Replication factor 3
Recipient Side
Message retrieval	100ms	400ms	From nearest replica
Decryption + verification	1ms	5ms	Reverse of send
Total (warm)	350ms	2.2s	Typical path

Factors Affecting Latency

Factor	Impact	Mitigation
Geographic distance	+50-200ms per hop	Regional relay selection
Network congestion	+100-500ms	Adaptive timeout, parallel queries
DHT network size	O(log n) lookup	Caching, shortcut routing
Circuit health	Variable	Proactive circuit rotation
Time of day	+20-50% peak hours	Load balancing
Mobile network	+100-300ms	Optimized packet sizes

Latency Optimization Techniques

Circuit Pooling:


Maintain pool of ready circuits:
  - 3 guard circuits (persistent)
  - 5 general circuits (rotating)
  - 2 backup circuits (warm standby)

Result: Eliminate circuit build latency for 95%+ of messages

Predictive Prefetching:


On app foreground:
  1. Refresh DHT routing table
  2. Pre-build circuits to frequent contacts
  3. Prefetch unread message pointers

Result: First message send is always "warm"

Throughput Metrics

Messages Per Second Per Node

Node Type	Messages/Second	Limiting Factor
Relay node (guard)	5,000	TLS termination
Relay node (middle)	8,000	Packet forwarding
Relay node (exit)	4,000	DHT operations
Storage node	2,000 writes	Disk I/O
Storage node	10,000 reads	Memory cache
Client (mobile)	50	Battery/CPU limits
Client (desktop)	200	Crypto operations

Maximum Concurrent Connections

Component	Connections	Memory Per Connection
Client WebSocket	1-3	~50KB
Relay node	10,000	~2KB
Storage node	5,000	~5KB
DHT node	500 peers	~1KB

Bandwidth Consumption

Activity	Bandwidth	Notes
Idle (connected)	1-2 KB/s	Keepalive + padding
Active chat	5-20 KB/s	Depends on message rate
Media send (photo)	100-500 KB/s	Chunked upload
Voice call	30-50 KB/s	Opus codec
Video call (720p)	500-1500 KB/s	VP9 codec
Background sync	0.5-1 KB/s	Heartbeat only

Traffic Padding Overhead:

Mode	Overhead	Privacy Level
Minimal	+5%	Basic
Standard	+20%	Good
Maximum	+50%	High
Constant-rate	+100-300%	Maximum

Cryptographic Performance

All benchmarks measured on reference hardware (Apple M2, Intel i7-12700).

X25519 Key Exchange

Operation	M2 (ARM64)	i7-12700 (x86)	WebCrypto
Key generation	25us	32us	45us
DH computation	28us	35us	50us
Throughput	35,000/s	28,000/s	20,000/s

X3DH Full Exchange (4 DH operations):

Operation	Duration	Notes
Fetch key bundle	200-500ms	Network bound
4x DH computation	0.1ms	CPU bound
HKDF derivation	0.02ms	Fast
Total (network)	200-500ms	Dominated by fetch
Total (local only)	0.15ms	If keys cached

AES-256-GCM Encryption/Decryption

Message Size	Encrypt	Decrypt	Throughput
256 bytes	0.02ms	0.02ms	12 GB/s
1 KB	0.03ms	0.03ms	30 GB/s
4 KB	0.05ms	0.05ms	75 GB/s
64 KB	0.3ms	0.3ms	200 GB/s
1 MB	4ms	4ms	250 GB/s

Hardware AES-NI enabled. WebCrypto typically 2-3x slower.

Ed25519 Signing and Verification

Operation	Duration	Throughput
Key generation	30us	33,000/s
Sign (64 bytes)	35us	28,000/s
Verify (64 bytes)	70us	14,000/s
Batch verify (100)	4ms	25,000/s

Signature size: 64 bytes (constant)

Kyber-768 (Post-Quantum) Operations

Operation	Duration	Notes
Key generation	50us	Generate keypair
Encapsulation	60us	Sender operation
Decapsulation	55us	Recipient operation
Hybrid X3DH total	0.3ms	X25519 + Kyber combined

Key/Ciphertext Sizes:

Component	Size
Kyber public key	1,184 bytes
Kyber private key	2,400 bytes
Kyber ciphertext	1,088 bytes
Shared secret	32 bytes
Overhead vs X25519-only	+2,272 bytes per session init

Double Ratchet Operations

Operation	Duration	Frequency
Symmetric ratchet (per message)	0.05ms	Every message
DH ratchet (key rotation)	0.1ms	Every reply
Full ratchet state serialization	0.5ms	On persist
State deserialization	0.3ms	On load

Memory per Session:

Component	Size
Ratchet state	~500 bytes
Skipped message keys (max)	~40 KB
Total per active session	~1-50 KB

Cryptographic Operation Summary

Operation	Time	Operations/Second
Send message (warm session)	0.5ms	2,000
Receive message	0.6ms	1,600
New session (X3DH)	0.15ms	6,500
New session (X3DH + Kyber)	0.3ms	3,300
Group message (Sender Keys)	0.3ms	3,300

Scanning Performance

Stealth Address Scanning Rate

Scanning involves ECDH computation per announcement to check ownership.

Metric	Value	Notes
ECDH per announcement	0.3ms	Core operation
Raw scan rate	3,300/s	Single-threaded
With view tag optimization	99% skip rate	First byte check
Effective scan rate	330,000/s	After view tag filter

View Tag Optimization Impact

Announcements/Day	Without View Tag	With View Tag	Speedup
10,000	3 seconds	0.03 seconds	100x
100,000	30 seconds	0.3 seconds	100x
1,000,000	5 minutes	3 seconds	100x
10,000,000	50 minutes	30 seconds	100x

Bloom Filter Efficiency

Parameter	Value
Expected elements	1,000,000
False positive rate	1%
Bits per element	9.6
Total filter size	1.2 MB
Hash functions	7
Lookup time	0.001ms

Multi-stage Cascade:

Stage	False Positive Rate	Cumulative
Stage 1 (coarse)	10%	10%
Stage 2 (fine)	1%	0.1%
Stage 3 (exact)	0%	0%

Batch Scanning Benchmarks

Batch Size	Duration	Announcements/Second
1,000	0.003s	333,000
10,000	0.03s	333,000
100,000	0.3s	333,000
1,000,000	3s	333,000

With view tag optimization enabled, 4 worker threads.

Parallel Scanning Architecture

Workers	Throughput	CPU Usage
1	100,000/s	25%
2	190,000/s	50%
4	350,000/s	90%
8	400,000/s	95%

Diminishing returns beyond 4 workers due to memory bandwidth.

DHT Operations

Lookup Latency

Network Size	Hops (O(log n))	Latency (p50)	Latency (p99)
1,000 nodes	10	300ms	800ms
10,000 nodes	14	400ms	1.1s
100,000 nodes	17	500ms	1.4s
1,000,000 nodes	20	600ms	1.8s

With alpha=3 parallel queries.

Storage Operations

Operation	Latency (p50)	Latency (p99)	Notes
STORE	150ms	500ms	Write to k=3 nodes
FIND_VALUE (hit)	100ms	400ms	First node with data
FIND_VALUE (miss)	300ms	1s	Full lookup
FIND_NODE	200ms	600ms	Routing lookup

Replication Overhead

Replication Factor	Write Amplification	Storage Overhead	Availability
k=1	1x	1x	90%
k=3	3x	3x	99.9%
k=5	5x	5x	99.99%
k=7	7x	7x	99.999%

Zentalk default: k=3 (optimal tradeoff)

DHT Memory Usage

Component	Per Node	Notes
Routing table	256 KB	256 buckets x 20 contacts x 50 bytes
Stored data	Variable	Depends on node role
Message cache	10 MB	Recent announcements
Connection state	1 KB/peer	Active connections

Group Chat Scaling

Performance vs Group Size

Group Size	Key Distribution	Message Encrypt	Message Decrypt
10	9 E2EE sends	0.3ms	0.3ms
50	49 E2EE sends	0.3ms	0.3ms
100	99 E2EE sends	0.3ms	0.3ms
500	499 E2EE sends	0.3ms	0.3ms
1000	999 E2EE sends	0.3ms	0.3ms

Key insight: Message encryption is O(1) regardless of group size due to Sender Keys.

Sender Keys Efficiency

Metric	Pairwise	Sender Keys	Improvement
Encrypt ops per message (100 members)	100	1	100x
Key material per member	O(n)	O(n)	Same
Message size	O(n)	O(1)	Linear
Bandwidth per message	O(n)	O(1)	Linear

Key Rotation Overhead

Event	Operations	Latency	Notes
Member join	n key sends	2-5s	Background
Member leave	n key regenerations	3-10s	All rotate
Periodic rotation	n key sends	2-5s	Every 1000 msgs
Device compromise	1 key regeneration	1s	Affected only

Group Chat Memory Footprint

Group Size	Sender Keys Storage	Message Cache	Total
10	3.2 KB	100 KB	~103 KB
50	16 KB	100 KB	~116 KB
100	32 KB	100 KB	~132 KB
500	160 KB	100 KB	~260 KB
1000	320 KB	100 KB	~420 KB

Client Resource Usage

Memory Footprint

Platform	Idle	Active Chat	Peak
iOS	45 MB	80 MB	150 MB
Android	50 MB	90 MB	180 MB
Desktop (Electron)	120 MB	200 MB	400 MB
Web (Chrome)	60 MB	100 MB	200 MB

Memory Breakdown (Active):

Component	Mobile	Desktop
WebAssembly runtime	15 MB	20 MB
Crypto state	5 MB	10 MB
Session cache	10 MB	30 MB
Message buffer	20 MB	50 MB
UI framework	30 MB	90 MB

CPU Usage Patterns

Activity	Mobile CPU	Desktop CPU	Duration
Idle	<1%	<1%	Continuous
Receiving message	5-10%	2-5%	100ms
Sending message	10-15%	5-8%	200ms
Voice call	15-25%	8-12%	Continuous
Video call	30-50%	15-25%	Continuous
Background sync	2-5%	1-3%	Periodic (1s)
Scanning (batch)	50-80%	30-50%	Depends on backlog

Battery Impact (Mobile)

Activity	mAh/hour	Relative to Idle
Idle (screen off)	5-10	1x
Connected (foreground)	50-80	8x
Active messaging	100-150	15x
Voice call	200-300	30x
Video call	400-600	50x

Optimization Strategies:

Strategy	Battery Savings	Tradeoff
Push notifications	80% idle reduction	Slight delay
Batch message fetch	40% active reduction	Grouping delay
Adaptive polling	30% reduction	Variable latency
Codec selection	20% call reduction	Quality tradeoff

Storage Requirements

Data Type	Per Item	Typical Total	Notes
Message (text)	0.5-2 KB	50-200 MB	100K messages
Message (with media ref)	1-3 KB	100-300 MB	Media stored separately
Media (photo)	50-500 KB	1-10 GB	Original quality
Media (thumbnail)	5-20 KB	50-200 MB	Preview cache
Session state	1-50 KB	1-10 MB	Per contact
DHT cache	N/A	10-50 MB	Routing tables
Crypto keys	N/A	1-5 MB	All key material

Total Storage (Typical User):

Usage Pattern	Storage	Notes
Light (text only)	50-100 MB	Few contacts
Moderate	200-500 MB	Regular usage
Heavy	1-5 GB	Many groups, media
Power user	5-20 GB	Full history, HD media

Network Scalability

Nodes vs Throughput

Network Size	Total Throughput	Per-Node Load	DHT Latency
100 nodes	50K msg/s	500/s	200ms
1,000 nodes	500K msg/s	500/s	300ms
10,000 nodes	5M msg/s	500/s	400ms
100,000 nodes	50M msg/s	500/s	500ms
1,000,000 nodes	500M msg/s	500/s	600ms

Key property: Throughput scales linearly with network size.

Geographic Distribution Impact

Configuration	Latency Impact	Availability
Single region	Baseline	99%
Multi-region (3)	+20-50ms	99.9%
Global (6+ regions)	+50-150ms	99.99%

Regional Relay Selection:

User Location	Preferred Guard	Latency Benefit
EU	EU guard	-100ms vs US
US-East	US-East guard	-50ms vs US-West
Asia	Asia guard	-200ms vs EU

Bottleneck Analysis

Component	Bottleneck Type	Limit	Mitigation
DHT lookup	Latency	O(log n)	Caching, shortcuts
Relay bandwidth	Throughput	10 Gbps typical	More relays
Storage nodes	IOPS	10K writes/s	SSD, caching
Bootstrap	Connection	10K concurrent	Multiple bootstraps
Key server	Queries	100K/s	DHT distribution

Horizontal Scaling Strategy


Scaling approach for each component:

DHT Layer:
  - Add nodes → automatic load distribution
  - No coordination needed
  - Linear scaling

Relay Layer:
  - Add relays → more circuit capacity
  - Geographic distribution for latency
  - Linear scaling

Storage Layer:
  - Add storage nodes → more capacity
  - Replication handles hot data
  - Linear scaling

Result: All layers scale horizontally without bottleneck

Optimization Strategies

Caching Strategies

Cache Type	Hit Rate	Latency Savings	Memory
DHT routing table	60%	200-500ms	256 KB
Key bundle cache	90%	200-500ms	10 MB
Circuit cache	95%	500-2000ms	1 MB
Message dedup cache	99.9%	N/A	10 MB
Session cache	99%	1-5ms	5 MB

Cache Invalidation:

Cache	TTL	Invalidation Trigger
Key bundle	1 hour	Safety number change
DHT entry	24 hours	Republish
Circuit	10 minutes	Error or rotation
Message	7 days	LRU eviction

Batch Processing

Operation	Individual	Batched	Improvement
DHT store (10 items)	1.5s	0.3s	5x
Message fetch (100)	10s	1s	10x
Signature verify (100)	7ms	4ms	1.75x
Key bundle fetch (10)	5s	0.8s	6x

Batch Sizes:

Operation	Optimal Batch	Max Batch
Message send	10	50
DHT query	5	20
Signature verify	100	1000
Scanning	10,000	100,000

Lazy Loading

Resource	Load Trigger	Memory Savings
Message history	Scroll to date	80%
Media thumbnails	Viewport entry	60%
Contact avatars	First view	40%
Session state	First message	70%
Group metadata	Group open	50%

Connection Pooling

Pool Type	Size	Reuse Rate	Latency Savings
WebSocket	3	99%	100-300ms
Circuit	8	95%	500-2000ms
DHT peer	50	80%	50-200ms
Storage node	10	90%	100-300ms

Pool Management:


Connection Pool Strategy:
  - Minimum connections: 3 (always ready)
  - Maximum connections: 20 (prevent resource exhaustion)
  - Idle timeout: 60s (balance freshness vs overhead)
  - Health check: 10s (detect failures)
  - Warm-up: On app start, build minimum pool

Benchmarking Methodology

Test Environment

Reference Hardware:

Component	Specification
CPU (ARM)	Apple M2, 8 cores
CPU (x86)	Intel i7-12700, 12 cores
Memory	16 GB
Storage	NVMe SSD
Network	1 Gbps symmetric

Mobile Reference:

Device	Specification
iOS	iPhone 14, A15 Bionic
Android	Pixel 7, Tensor G2

Measurement Tools

Tool	Purpose	Metrics
`perf` / Instruments	CPU profiling	Cycles, cache misses
Heaptrack	Memory profiling	Allocations, leaks
Wireshark	Network analysis	Packets, latency
Custom harness	End-to-end timing	User-perceived latency
Prometheus	Production metrics	Real-world performance

Benchmark Categories

Category	What It Measures	How Often
Microbenchmarks	Individual operations	Every commit
Integration benchmarks	Component interactions	Daily
System benchmarks	Full message flow	Weekly
Load tests	Scalability limits	Pre-release
Chaos tests	Failure resilience	Monthly

Reproducibility Requirements

Requirement	Implementation
Isolated environment	Docker containers
Controlled network	tc (traffic control)
Fixed random seeds	Deterministic tests
Warm-up period	Discard first 1000 ops
Statistical significance	10,000+ iterations
Multiple runs	5 runs, report median

Benchmark Caveats

Caveat	Impact	Mitigation
Network variability	+/- 50% latency	Controlled test network
CPU frequency scaling	+/- 20% throughput	Fixed frequency
GC pauses	+/- 30% p99	Report percentiles
JIT warm-up	First run slower	Warm-up iterations
Real-world load	Different patterns	Production monitoring

Performance Monitoring

Key Metrics to Track

Metric	Target	Alert Threshold
Message latency (p50)	<500ms	>1s
Message latency (p99)	<3s	>5s
Circuit build time	<2s	>5s
DHT lookup time	<1s	>3s
Encryption throughput	>1000/s	<500/s
Memory usage (mobile)	<100MB	>200MB
Battery drain (idle)	<5mAh/h	>20mAh/h

Performance Regression Detection


Regression Detection Pipeline:

1. Run benchmark suite on every PR
2. Compare against baseline (main branch)
3. Flag if:
   - p50 regresses by >10%
   - p99 regresses by >20%
   - Memory increases by >15%
   - Any metric exceeds absolute threshold

4. Require explicit approval for performance regressions

Production Monitoring

Data Point	Collection Method	Retention
Client-side latency	In-app telemetry	30 days
Server-side metrics	Prometheus	90 days
Error rates	Sentry	90 days
Network topology	DHT crawl	7 days
Bandwidth usage	Flow logs	30 days

Protocol Specification - Cryptographic protocol details
Cryptography Fundamentals - Algorithm specifications
DHT and Kademlia - Distributed lookup system
Message Delivery - Delivery pipeline
Scanning Protocol - Stealth address scanning
Group Chat Protocol - Sender Keys efficiency