Voice & Video Calls
Technical specification of real-time encrypted communication.
Overview
Zentalk uses WebRTC for peer-to-peer voice and video calls, with DTLS-SRTP for media encryption and E2EE for signaling.
Protocol Stack
| Layer | Protocol | Purpose |
|---|---|---|
| Signaling | E2EE (Double Ratchet) | SDP exchange, ICE candidates |
| Key Exchange | DTLS 1.2 | Derive SRTP keys |
| Media Transport | SRTP | Encrypted audio/video |
| NAT Traversal | ICE/STUN/TURN | Connection establishment |
Signaling (E2EE)
Call setup messages are encrypted using existing Double Ratchet sessions:
| Message | Content | Encryption |
|---|---|---|
| Call Offer | SDP, DTLS fingerprint | Double Ratchet |
| Call Answer | SDP, DTLS fingerprint | Double Ratchet |
| ICE Candidates | Address:port pairs | Double Ratchet |
| Call End | Termination signal | Double Ratchet |
SDP Security
Encrypted SDP contains:
- Media codecs (Opus, VP8/VP9)
- DTLS fingerprint (SHA-256)
- ICE credentials
- Candidate addressesThe server never sees SDP content - only encrypted blobs.
DTLS-SRTP
Datagram Transport Layer Security establishes media encryption keys:
Handshake
| Step | Message | Purpose |
|---|---|---|
| 1 | ClientHello | Cipher suite proposal |
| 2 | ServerHello | Cipher selection |
| 3 | Certificate | Self-signed, fingerprint verified |
| 4 | KeyExchange | ECDHE key exchange |
| 5 | Finished | Handshake complete |
Fingerprint Verification
1. Alice includes DTLS fingerprint in SDP (E2EE signaling)
2. Bob receives fingerprint via Double Ratchet
3. During DTLS handshake, Bob verifies certificate matches fingerprint
4. If mismatch → terminate call (MITM detected)Key Derivation
DTLS Master Secret
│
├─► SRTP Encryption Key (AES-128)
├─► SRTP Authentication Key (HMAC-SHA1)
└─► SRTP Salt (112 bits)SRTP Parameters
| Parameter | Value |
|---|---|
| Cipher | AES-128-GCM or AES-128-CTR |
| Authentication | HMAC-SHA1-80 (if not GCM) |
| Key size | 128 bits |
| Salt | 112 bits |
| Replay protection | 64-bit sequence window |
Packet Format
┌─────────────────────────────────────────┐
│ RTP Header (12 bytes) │
├─────────────────────────────────────────┤
│ Encrypted Payload (variable) │
├─────────────────────────────────────────┤
│ Authentication Tag (10 bytes) │
└─────────────────────────────────────────┘ICE (Interactive Connectivity Establishment)
Candidate Types
| Type | Description | Privacy |
|---|---|---|
| Host | Local IP address | Reveals LAN IP |
| Server Reflexive | Public IP via STUN | Reveals public IP |
| Relay | TURN server relay | Hides both IPs |
TURN Servers
For maximum privacy, relay candidates are preferred:
| Property | Value |
|---|---|
| Protocol | TURNS (TLS encrypted) |
| Authentication | Ephemeral credentials |
| Credential lifetime | 24 hours |
| Server selection | Geographic proximity |
Connection Priority
1. Direct peer-to-peer (if possible)
2. STUN-assisted (NAT traversal)
3. TURN relay (fallback)Audio Encryption
Opus Codec
| Parameter | Value |
|---|---|
| Sample rate | 48 kHz |
| Bit rate | 6-510 kbps (VBR) |
| Frame size | 20 ms |
| Channels | 1 (mono) or 2 (stereo) |
Per-Packet Encryption
For each audio packet:
1. Encode audio → Opus frame
2. Encrypt with SRTP key
3. Add authentication tag
4. Transmit via UDPVideo Encryption
Codecs
| Codec | Usage |
|---|---|
| VP8 | Default, good compatibility |
| VP9 | Higher efficiency |
| H.264 | Hardware acceleration |
Selective Forwarding (SFU)
For group calls, video uses SFU architecture:
| Property | Value |
|---|---|
| Encryption | E2EE (insertable streams) |
| SFU visibility | Encrypted bitstream only |
| Key distribution | Via E2EE signaling |
Group Calls
Architecture
| Participants | Method |
|---|---|
| 2 | Peer-to-peer |
| 3-6 | SFU (Selective Forwarding) |
Group Call Keys
1. Initiator generates Media Key (MK)
2. MK distributed via Double Ratchet to each participant
3. All participants use MK for SRTP
4. Key rotation on participant changeInsertable Streams API
Encryption Pipeline:
Camera → Encode → Encrypt(MK) → SFU → Decrypt(MK) → Decode → Display
SFU sees: Encrypted bitstream (cannot decode)Security Properties
| Property | Mechanism |
|---|---|
| Confidentiality | SRTP (AES-128-GCM) |
| Integrity | GCM tag / HMAC-SHA1 |
| Authenticity | DTLS fingerprint verification |
| Forward Secrecy | ECDHE in DTLS |
| Replay Protection | SRTP sequence numbers |
Metadata Protection
| Data | Protection |
|---|---|
| Call initiation | 3-hop relay |
| Call duration | Local only, not logged |
| Participant IPs | TURN relay option |
| SDP content | E2EE encrypted |
Call States
| State | Description |
|---|---|
| IDLE | No active call |
| OFFERING | SDP offer sent |
| ANSWERING | SDP answer sent |
| CONNECTING | ICE/DTLS in progress |
| CONNECTED | Media flowing |
| DISCONNECTED | Call ended |
Error Handling
| Error | Action |
|---|---|
| DTLS fingerprint mismatch | Terminate, warn user |
| ICE timeout | Fallback to TURN |
| SRTP auth failure | Drop packet |
| Connection lost | Attempt reconnect (30s) |
Related Documentation
- Protocol Specification - E2EE for signaling
- Architecture - System components
- Features - Feature overview
Last updated on