Performance
Comprehensive performance analysis, benchmarks, and optimization strategies for the Zentalk protocol.
Performance Philosophy
Zentalk prioritizes security and privacy over raw performance, but extensive optimizations ensure the system remains highly responsive for end users.
Design Goals
| Priority | Goal | Tradeoff |
|---|---|---|
| 1 | Security | Never compromise cryptographic operations for speed |
| 2 | Privacy | Metadata protection worth additional latency |
| 3 | Reliability | Redundancy over minimal resource usage |
| 4 | Responsiveness | User-perceived latency optimization |
| 5 | Efficiency | Resource optimization where possible |
Performance vs Privacy Tradeoffs
| Feature | Performance Impact | Privacy Benefit |
|---|---|---|
| 3-hop relay routing | +150-400ms latency | Hides sender/recipient |
| Traffic padding | +20% bandwidth | Prevents traffic analysis |
| Stealth addresses | Scanning overhead | Unlinkable payments |
| Double Ratchet | Per-message crypto ops | Forward secrecy |
| DHT lookup | O(log n) hops | Decentralized, censorship resistant |
Latency Characteristics
End-to-End Message Latency
Measured latency from send button press to recipient notification:
| Percentile | Cold Start | Warm (Circuit Ready) | Notes |
|---|---|---|---|
| p50 | 1.2s | 350ms | Typical conditions |
| p75 | 1.8s | 520ms | Moderate network |
| p90 | 2.5s | 850ms | Congested network |
| p95 | 3.2s | 1.1s | Poor conditions |
| p99 | 5.5s | 2.2s | Worst case |
Cold start includes circuit building. Warm assumes established circuit.
Latency Breakdown by Component
| Component | Duration (p50) | Duration (p99) | Notes |
|---|---|---|---|
| Client-Side Encryption | |||
| Double Ratchet key derivation | 0.5ms | 2ms | HKDF operations |
| AES-256-GCM encryption | 0.3ms | 1ms | Per message |
| Ed25519 signature | 0.1ms | 0.3ms | Message authentication |
| Network Transport | |||
| Circuit build (if needed) | 800ms | 2.5s | 3 TLS handshakes |
| 3-hop relay routing | 150ms | 500ms | Sequential relay |
| DHT lookup | 200ms | 800ms | O(log n) hops |
| Mesh storage | 50ms | 200ms | Replication factor 3 |
| Recipient Side | |||
| Message retrieval | 100ms | 400ms | From nearest replica |
| Decryption + verification | 1ms | 5ms | Reverse of send |
| Total (warm) | 350ms | 2.2s | Typical path |
Factors Affecting Latency
| Factor | Impact | Mitigation |
|---|---|---|
| Geographic distance | +50-200ms per hop | Regional relay selection |
| Network congestion | +100-500ms | Adaptive timeout, parallel queries |
| DHT network size | O(log n) lookup | Caching, shortcut routing |
| Circuit health | Variable | Proactive circuit rotation |
| Time of day | +20-50% peak hours | Load balancing |
| Mobile network | +100-300ms | Optimized packet sizes |
Latency Optimization Techniques
Circuit Pooling:
Maintain pool of ready circuits:
- 3 guard circuits (persistent)
- 5 general circuits (rotating)
- 2 backup circuits (warm standby)
Result: Eliminate circuit build latency for 95%+ of messagesPredictive Prefetching:
On app foreground:
1. Refresh DHT routing table
2. Pre-build circuits to frequent contacts
3. Prefetch unread message pointers
Result: First message send is always "warm"Throughput Metrics
Messages Per Second Per Node
| Node Type | Messages/Second | Limiting Factor |
|---|---|---|
| Relay node (guard) | 5,000 | TLS termination |
| Relay node (middle) | 8,000 | Packet forwarding |
| Relay node (exit) | 4,000 | DHT operations |
| Storage node | 2,000 writes | Disk I/O |
| Storage node | 10,000 reads | Memory cache |
| Client (mobile) | 50 | Battery/CPU limits |
| Client (desktop) | 200 | Crypto operations |
Maximum Concurrent Connections
| Component | Connections | Memory Per Connection |
|---|---|---|
| Client WebSocket | 1-3 | ~50KB |
| Relay node | 10,000 | ~2KB |
| Storage node | 5,000 | ~5KB |
| DHT node | 500 peers | ~1KB |
Bandwidth Consumption
| Activity | Bandwidth | Notes |
|---|---|---|
| Idle (connected) | 1-2 KB/s | Keepalive + padding |
| Active chat | 5-20 KB/s | Depends on message rate |
| Media send (photo) | 100-500 KB/s | Chunked upload |
| Voice call | 30-50 KB/s | Opus codec |
| Video call (720p) | 500-1500 KB/s | VP9 codec |
| Background sync | 0.5-1 KB/s | Heartbeat only |
Traffic Padding Overhead:
| Mode | Overhead | Privacy Level |
|---|---|---|
| Minimal | +5% | Basic |
| Standard | +20% | Good |
| Maximum | +50% | High |
| Constant-rate | +100-300% | Maximum |
Cryptographic Performance
All benchmarks measured on reference hardware (Apple M2, Intel i7-12700).
X25519 Key Exchange
| Operation | M2 (ARM64) | i7-12700 (x86) | WebCrypto |
|---|---|---|---|
| Key generation | 25us | 32us | 45us |
| DH computation | 28us | 35us | 50us |
| Throughput | 35,000/s | 28,000/s | 20,000/s |
X3DH Full Exchange (4 DH operations):
| Operation | Duration | Notes |
|---|---|---|
| Fetch key bundle | 200-500ms | Network bound |
| 4x DH computation | 0.1ms | CPU bound |
| HKDF derivation | 0.02ms | Fast |
| Total (network) | 200-500ms | Dominated by fetch |
| Total (local only) | 0.15ms | If keys cached |
AES-256-GCM Encryption/Decryption
| Message Size | Encrypt | Decrypt | Throughput |
|---|---|---|---|
| 256 bytes | 0.02ms | 0.02ms | 12 GB/s |
| 1 KB | 0.03ms | 0.03ms | 30 GB/s |
| 4 KB | 0.05ms | 0.05ms | 75 GB/s |
| 64 KB | 0.3ms | 0.3ms | 200 GB/s |
| 1 MB | 4ms | 4ms | 250 GB/s |
Hardware AES-NI enabled. WebCrypto typically 2-3x slower.
Ed25519 Signing and Verification
| Operation | Duration | Throughput |
|---|---|---|
| Key generation | 30us | 33,000/s |
| Sign (64 bytes) | 35us | 28,000/s |
| Verify (64 bytes) | 70us | 14,000/s |
| Batch verify (100) | 4ms | 25,000/s |
Signature size: 64 bytes (constant)
Kyber-768 (Post-Quantum) Operations
| Operation | Duration | Notes |
|---|---|---|
| Key generation | 50us | Generate keypair |
| Encapsulation | 60us | Sender operation |
| Decapsulation | 55us | Recipient operation |
| Hybrid X3DH total | 0.3ms | X25519 + Kyber combined |
Key/Ciphertext Sizes:
| Component | Size |
|---|---|
| Kyber public key | 1,184 bytes |
| Kyber private key | 2,400 bytes |
| Kyber ciphertext | 1,088 bytes |
| Shared secret | 32 bytes |
| Overhead vs X25519-only | +2,272 bytes per session init |
Double Ratchet Operations
| Operation | Duration | Frequency |
|---|---|---|
| Symmetric ratchet (per message) | 0.05ms | Every message |
| DH ratchet (key rotation) | 0.1ms | Every reply |
| Full ratchet state serialization | 0.5ms | On persist |
| State deserialization | 0.3ms | On load |
Memory per Session:
| Component | Size |
|---|---|
| Ratchet state | ~500 bytes |
| Skipped message keys (max) | ~40 KB |
| Total per active session | ~1-50 KB |
Cryptographic Operation Summary
| Operation | Time | Operations/Second |
|---|---|---|
| Send message (warm session) | 0.5ms | 2,000 |
| Receive message | 0.6ms | 1,600 |
| New session (X3DH) | 0.15ms | 6,500 |
| New session (X3DH + Kyber) | 0.3ms | 3,300 |
| Group message (Sender Keys) | 0.3ms | 3,300 |
Scanning Performance
Stealth Address Scanning Rate
Scanning involves ECDH computation per announcement to check ownership.
| Metric | Value | Notes |
|---|---|---|
| ECDH per announcement | 0.3ms | Core operation |
| Raw scan rate | 3,300/s | Single-threaded |
| With view tag optimization | 99% skip rate | First byte check |
| Effective scan rate | 330,000/s | After view tag filter |
View Tag Optimization Impact
| Announcements/Day | Without View Tag | With View Tag | Speedup |
|---|---|---|---|
| 10,000 | 3 seconds | 0.03 seconds | 100x |
| 100,000 | 30 seconds | 0.3 seconds | 100x |
| 1,000,000 | 5 minutes | 3 seconds | 100x |
| 10,000,000 | 50 minutes | 30 seconds | 100x |
Bloom Filter Efficiency
| Parameter | Value |
|---|---|
| Expected elements | 1,000,000 |
| False positive rate | 1% |
| Bits per element | 9.6 |
| Total filter size | 1.2 MB |
| Hash functions | 7 |
| Lookup time | 0.001ms |
Multi-stage Cascade:
| Stage | False Positive Rate | Cumulative |
|---|---|---|
| Stage 1 (coarse) | 10% | 10% |
| Stage 2 (fine) | 1% | 0.1% |
| Stage 3 (exact) | 0% | 0% |
Batch Scanning Benchmarks
| Batch Size | Duration | Announcements/Second |
|---|---|---|
| 1,000 | 0.003s | 333,000 |
| 10,000 | 0.03s | 333,000 |
| 100,000 | 0.3s | 333,000 |
| 1,000,000 | 3s | 333,000 |
With view tag optimization enabled, 4 worker threads.
Parallel Scanning Architecture
| Workers | Throughput | CPU Usage |
|---|---|---|
| 1 | 100,000/s | 25% |
| 2 | 190,000/s | 50% |
| 4 | 350,000/s | 90% |
| 8 | 400,000/s | 95% |
Diminishing returns beyond 4 workers due to memory bandwidth.
DHT Operations
Lookup Latency
| Network Size | Hops (O(log n)) | Latency (p50) | Latency (p99) |
|---|---|---|---|
| 1,000 nodes | 10 | 300ms | 800ms |
| 10,000 nodes | 14 | 400ms | 1.1s |
| 100,000 nodes | 17 | 500ms | 1.4s |
| 1,000,000 nodes | 20 | 600ms | 1.8s |
With alpha=3 parallel queries.
Storage Operations
| Operation | Latency (p50) | Latency (p99) | Notes |
|---|---|---|---|
| STORE | 150ms | 500ms | Write to k=3 nodes |
| FIND_VALUE (hit) | 100ms | 400ms | First node with data |
| FIND_VALUE (miss) | 300ms | 1s | Full lookup |
| FIND_NODE | 200ms | 600ms | Routing lookup |
Replication Overhead
| Replication Factor | Write Amplification | Storage Overhead | Availability |
|---|---|---|---|
| k=1 | 1x | 1x | 90% |
| k=3 | 3x | 3x | 99.9% |
| k=5 | 5x | 5x | 99.99% |
| k=7 | 7x | 7x | 99.999% |
Zentalk default: k=3 (optimal tradeoff)
DHT Memory Usage
| Component | Per Node | Notes |
|---|---|---|
| Routing table | 256 KB | 256 buckets x 20 contacts x 50 bytes |
| Stored data | Variable | Depends on node role |
| Message cache | 10 MB | Recent announcements |
| Connection state | 1 KB/peer | Active connections |
Group Chat Scaling
Performance vs Group Size
| Group Size | Key Distribution | Message Encrypt | Message Decrypt |
|---|---|---|---|
| 10 | 9 E2EE sends | 0.3ms | 0.3ms |
| 50 | 49 E2EE sends | 0.3ms | 0.3ms |
| 100 | 99 E2EE sends | 0.3ms | 0.3ms |
| 500 | 499 E2EE sends | 0.3ms | 0.3ms |
| 1000 | 999 E2EE sends | 0.3ms | 0.3ms |
Key insight: Message encryption is O(1) regardless of group size due to Sender Keys.
Sender Keys Efficiency
| Metric | Pairwise | Sender Keys | Improvement |
|---|---|---|---|
| Encrypt ops per message (100 members) | 100 | 1 | 100x |
| Key material per member | O(n) | O(n) | Same |
| Message size | O(n) | O(1) | Linear |
| Bandwidth per message | O(n) | O(1) | Linear |
Key Rotation Overhead
| Event | Operations | Latency | Notes |
|---|---|---|---|
| Member join | n key sends | 2-5s | Background |
| Member leave | n key regenerations | 3-10s | All rotate |
| Periodic rotation | n key sends | 2-5s | Every 1000 msgs |
| Device compromise | 1 key regeneration | 1s | Affected only |
Group Chat Memory Footprint
| Group Size | Sender Keys Storage | Message Cache | Total |
|---|---|---|---|
| 10 | 3.2 KB | 100 KB | ~103 KB |
| 50 | 16 KB | 100 KB | ~116 KB |
| 100 | 32 KB | 100 KB | ~132 KB |
| 500 | 160 KB | 100 KB | ~260 KB |
| 1000 | 320 KB | 100 KB | ~420 KB |
Client Resource Usage
Memory Footprint
| Platform | Idle | Active Chat | Peak |
|---|---|---|---|
| iOS | 45 MB | 80 MB | 150 MB |
| Android | 50 MB | 90 MB | 180 MB |
| Desktop (Electron) | 120 MB | 200 MB | 400 MB |
| Web (Chrome) | 60 MB | 100 MB | 200 MB |
Memory Breakdown (Active):
| Component | Mobile | Desktop |
|---|---|---|
| WebAssembly runtime | 15 MB | 20 MB |
| Crypto state | 5 MB | 10 MB |
| Session cache | 10 MB | 30 MB |
| Message buffer | 20 MB | 50 MB |
| UI framework | 30 MB | 90 MB |
CPU Usage Patterns
| Activity | Mobile CPU | Desktop CPU | Duration |
|---|---|---|---|
| Idle | <1% | <1% | Continuous |
| Receiving message | 5-10% | 2-5% | 100ms |
| Sending message | 10-15% | 5-8% | 200ms |
| Voice call | 15-25% | 8-12% | Continuous |
| Video call | 30-50% | 15-25% | Continuous |
| Background sync | 2-5% | 1-3% | Periodic (1s) |
| Scanning (batch) | 50-80% | 30-50% | Depends on backlog |
Battery Impact (Mobile)
| Activity | mAh/hour | Relative to Idle |
|---|---|---|
| Idle (screen off) | 5-10 | 1x |
| Connected (foreground) | 50-80 | 8x |
| Active messaging | 100-150 | 15x |
| Voice call | 200-300 | 30x |
| Video call | 400-600 | 50x |
Optimization Strategies:
| Strategy | Battery Savings | Tradeoff |
|---|---|---|
| Push notifications | 80% idle reduction | Slight delay |
| Batch message fetch | 40% active reduction | Grouping delay |
| Adaptive polling | 30% reduction | Variable latency |
| Codec selection | 20% call reduction | Quality tradeoff |
Storage Requirements
| Data Type | Per Item | Typical Total | Notes |
|---|---|---|---|
| Message (text) | 0.5-2 KB | 50-200 MB | 100K messages |
| Message (with media ref) | 1-3 KB | 100-300 MB | Media stored separately |
| Media (photo) | 50-500 KB | 1-10 GB | Original quality |
| Media (thumbnail) | 5-20 KB | 50-200 MB | Preview cache |
| Session state | 1-50 KB | 1-10 MB | Per contact |
| DHT cache | N/A | 10-50 MB | Routing tables |
| Crypto keys | N/A | 1-5 MB | All key material |
Total Storage (Typical User):
| Usage Pattern | Storage | Notes |
|---|---|---|
| Light (text only) | 50-100 MB | Few contacts |
| Moderate | 200-500 MB | Regular usage |
| Heavy | 1-5 GB | Many groups, media |
| Power user | 5-20 GB | Full history, HD media |
Network Scalability
Nodes vs Throughput
| Network Size | Total Throughput | Per-Node Load | DHT Latency |
|---|---|---|---|
| 100 nodes | 50K msg/s | 500/s | 200ms |
| 1,000 nodes | 500K msg/s | 500/s | 300ms |
| 10,000 nodes | 5M msg/s | 500/s | 400ms |
| 100,000 nodes | 50M msg/s | 500/s | 500ms |
| 1,000,000 nodes | 500M msg/s | 500/s | 600ms |
Key property: Throughput scales linearly with network size.
Geographic Distribution Impact
| Configuration | Latency Impact | Availability |
|---|---|---|
| Single region | Baseline | 99% |
| Multi-region (3) | +20-50ms | 99.9% |
| Global (6+ regions) | +50-150ms | 99.99% |
Regional Relay Selection:
| User Location | Preferred Guard | Latency Benefit |
|---|---|---|
| EU | EU guard | -100ms vs US |
| US-East | US-East guard | -50ms vs US-West |
| Asia | Asia guard | -200ms vs EU |
Bottleneck Analysis
| Component | Bottleneck Type | Limit | Mitigation |
|---|---|---|---|
| DHT lookup | Latency | O(log n) | Caching, shortcuts |
| Relay bandwidth | Throughput | 10 Gbps typical | More relays |
| Storage nodes | IOPS | 10K writes/s | SSD, caching |
| Bootstrap | Connection | 10K concurrent | Multiple bootstraps |
| Key server | Queries | 100K/s | DHT distribution |
Horizontal Scaling Strategy
Scaling approach for each component:
DHT Layer:
- Add nodes → automatic load distribution
- No coordination needed
- Linear scaling
Relay Layer:
- Add relays → more circuit capacity
- Geographic distribution for latency
- Linear scaling
Storage Layer:
- Add storage nodes → more capacity
- Replication handles hot data
- Linear scaling
Result: All layers scale horizontally without bottleneckOptimization Strategies
Caching Strategies
| Cache Type | Hit Rate | Latency Savings | Memory |
|---|---|---|---|
| DHT routing table | 60% | 200-500ms | 256 KB |
| Key bundle cache | 90% | 200-500ms | 10 MB |
| Circuit cache | 95% | 500-2000ms | 1 MB |
| Message dedup cache | 99.9% | N/A | 10 MB |
| Session cache | 99% | 1-5ms | 5 MB |
Cache Invalidation:
| Cache | TTL | Invalidation Trigger |
|---|---|---|
| Key bundle | 1 hour | Safety number change |
| DHT entry | 24 hours | Republish |
| Circuit | 10 minutes | Error or rotation |
| Message | 7 days | LRU eviction |
Batch Processing
| Operation | Individual | Batched | Improvement |
|---|---|---|---|
| DHT store (10 items) | 1.5s | 0.3s | 5x |
| Message fetch (100) | 10s | 1s | 10x |
| Signature verify (100) | 7ms | 4ms | 1.75x |
| Key bundle fetch (10) | 5s | 0.8s | 6x |
Batch Sizes:
| Operation | Optimal Batch | Max Batch |
|---|---|---|
| Message send | 10 | 50 |
| DHT query | 5 | 20 |
| Signature verify | 100 | 1000 |
| Scanning | 10,000 | 100,000 |
Lazy Loading
| Resource | Load Trigger | Memory Savings |
|---|---|---|
| Message history | Scroll to date | 80% |
| Media thumbnails | Viewport entry | 60% |
| Contact avatars | First view | 40% |
| Session state | First message | 70% |
| Group metadata | Group open | 50% |
Connection Pooling
| Pool Type | Size | Reuse Rate | Latency Savings |
|---|---|---|---|
| WebSocket | 3 | 99% | 100-300ms |
| Circuit | 8 | 95% | 500-2000ms |
| DHT peer | 50 | 80% | 50-200ms |
| Storage node | 10 | 90% | 100-300ms |
Pool Management:
Connection Pool Strategy:
- Minimum connections: 3 (always ready)
- Maximum connections: 20 (prevent resource exhaustion)
- Idle timeout: 60s (balance freshness vs overhead)
- Health check: 10s (detect failures)
- Warm-up: On app start, build minimum poolBenchmarking Methodology
Test Environment
Reference Hardware:
| Component | Specification |
|---|---|
| CPU (ARM) | Apple M2, 8 cores |
| CPU (x86) | Intel i7-12700, 12 cores |
| Memory | 16 GB |
| Storage | NVMe SSD |
| Network | 1 Gbps symmetric |
Mobile Reference:
| Device | Specification |
|---|---|
| iOS | iPhone 14, A15 Bionic |
| Android | Pixel 7, Tensor G2 |
Measurement Tools
| Tool | Purpose | Metrics |
|---|---|---|
perf / Instruments | CPU profiling | Cycles, cache misses |
| Heaptrack | Memory profiling | Allocations, leaks |
| Wireshark | Network analysis | Packets, latency |
| Custom harness | End-to-end timing | User-perceived latency |
| Prometheus | Production metrics | Real-world performance |
Benchmark Categories
| Category | What It Measures | How Often |
|---|---|---|
| Microbenchmarks | Individual operations | Every commit |
| Integration benchmarks | Component interactions | Daily |
| System benchmarks | Full message flow | Weekly |
| Load tests | Scalability limits | Pre-release |
| Chaos tests | Failure resilience | Monthly |
Reproducibility Requirements
| Requirement | Implementation |
|---|---|
| Isolated environment | Docker containers |
| Controlled network | tc (traffic control) |
| Fixed random seeds | Deterministic tests |
| Warm-up period | Discard first 1000 ops |
| Statistical significance | 10,000+ iterations |
| Multiple runs | 5 runs, report median |
Benchmark Caveats
| Caveat | Impact | Mitigation |
|---|---|---|
| Network variability | +/- 50% latency | Controlled test network |
| CPU frequency scaling | +/- 20% throughput | Fixed frequency |
| GC pauses | +/- 30% p99 | Report percentiles |
| JIT warm-up | First run slower | Warm-up iterations |
| Real-world load | Different patterns | Production monitoring |
Performance Monitoring
Key Metrics to Track
| Metric | Target | Alert Threshold |
|---|---|---|
| Message latency (p50) | <500ms | >1s |
| Message latency (p99) | <3s | >5s |
| Circuit build time | <2s | >5s |
| DHT lookup time | <1s | >3s |
| Encryption throughput | >1000/s | <500/s |
| Memory usage (mobile) | <100MB | >200MB |
| Battery drain (idle) | <5mAh/h | >20mAh/h |
Performance Regression Detection
Regression Detection Pipeline:
1. Run benchmark suite on every PR
2. Compare against baseline (main branch)
3. Flag if:
- p50 regresses by >10%
- p99 regresses by >20%
- Memory increases by >15%
- Any metric exceeds absolute threshold
4. Require explicit approval for performance regressionsProduction Monitoring
| Data Point | Collection Method | Retention |
|---|---|---|
| Client-side latency | In-app telemetry | 30 days |
| Server-side metrics | Prometheus | 90 days |
| Error rates | Sentry | 90 days |
| Network topology | DHT crawl | 7 days |
| Bandwidth usage | Flow logs | 30 days |
Related Documentation
- Protocol Specification - Cryptographic protocol details
- Cryptography Fundamentals - Algorithm specifications
- DHT and Kademlia - Distributed lookup system
- Message Delivery - Delivery pipeline
- Scanning Protocol - Stealth address scanning
- Group Chat Protocol - Sender Keys efficiency