Link Preview Privacy
Technical specification for privacy-preserving link previews in Zentalk.
Overview
Link previews enhance user experience by displaying website metadata (title, description, thumbnail) inline with messages. However, traditional implementations create significant privacy risks. Zentalk implements a relay-based architecture that prevents the target server from learning anything about the user requesting the preview.
The Privacy Problem
Traditional Link Preview Risks
When a messaging application fetches a link preview directly from the target URL, it exposes sensitive information:
| Risk | Description | Privacy Impact |
|---|---|---|
| IP Address Exposure | User’s real IP sent to target server | Location tracking, identity correlation |
| Referer Header Leakage | Messenger identified in HTTP headers | Usage pattern disclosure |
| Timing Correlation | Request timing reveals message activity | Behavioral analysis |
| DNS Leakage | DNS queries reveal browsing intent | ISP surveillance, network monitoring |
| TLS Fingerprinting | Client characteristics exposed | Device identification |
Attack Vectors in Traditional Implementations
| Attack | Mechanism | Consequence |
|---|---|---|
| Tracking Pixel Injection | Unique URL per recipient | Identify who viewed preview |
| IP Harvesting | Log requests to shared links | Map user IP addresses |
| Timing Analysis | Correlate preview fetch with message send | Deanonymize senders |
| Referer Mining | Extract messenger identity from headers | Profile user’s app usage |
| Request Fingerprinting | Analyze TLS/HTTP characteristics | Identify device types |
Correlation Attack Example
Traditional Preview Flow (INSECURE):
1. Alice sends link to Bob in Zentalk
2. Alice's client fetches preview from target.com
→ Target sees: IP=Alice, Referer=zentalk-client
3. Bob's client fetches preview from target.com
→ Target sees: IP=Bob, Referer=zentalk-client
4. Target can correlate:
→ Two Zentalk users accessed same URL
→ Timing suggests communication between them
→ IPs reveal approximate locationsZentalk’s Solution
Relay-Based Architecture
Zentalk routes all preview requests through a distributed relay network, ensuring no direct connection between user clients and target URLs.
| Component | Role | Knowledge |
|---|---|---|
| User Client | Requests preview via relay | Knows target URL |
| Entry Relay | First hop in relay chain | Knows client IP, not target URL |
| Exit Relay | Fetches from target URL | Knows target URL, not client IP |
| Target Server | Serves preview content | Sees only relay IP |
Privacy Guarantees
| Property | Mechanism | Guarantee |
|---|---|---|
| IP Anonymity | Multi-hop relay routing | Target never sees client IP |
| Referer Protection | Relay strips/replaces headers | No messenger identification |
| Timing Obfuscation | Batched requests, random delays | Correlation resistance |
| DNS Privacy | Relay performs DNS resolution | Client DNS queries hidden |
| Request Unlinkability | Per-request circuit rotation | No persistent fingerprint |
Comparison with Traditional Approaches
| Approach | IP Hidden | Referer Hidden | Timing Protected | Decentralized |
|---|---|---|---|---|
| Direct Fetch | No | No | No | N/A |
| Single Proxy | Yes | Yes | No | No |
| VPN-Based | Yes | Partial | No | No |
| Tor-Style (Zentalk) | Yes | Yes | Partial | Yes |
Preview Generation Flow
End-to-End Process
Privacy-Preserving Preview Flow:
1. USER PASTES LINK
→ Client detects URL pattern in message input
→ Preview request initiated (if enabled)
2. BUILD RELAY CIRCUIT
→ Select 2-hop circuit from relay pool
→ Entry relay: Knows client, not destination
→ Exit relay: Knows destination, not client
3. ENCRYPT REQUEST
→ Construct preview request
→ Encrypt for exit relay (inner layer)
→ Encrypt for entry relay (outer layer)
4. ROUTE THROUGH RELAYS
→ Client → Entry Relay (onion layer 1)
→ Entry Relay → Exit Relay (onion layer 2)
→ Exit Relay → Target URL (plaintext HTTPS)
5. FETCH AND SANITIZE
→ Exit relay fetches target URL
→ Content sanitized (scripts removed)
→ Metadata extracted (title, description, image)
6. RETURN ENCRYPTED RESPONSE
→ Exit relay encrypts response for client
→ Routed back through entry relay
→ Client decrypts preview data
7. CACHE AND DISPLAY
→ Preview cached locally (encrypted)
→ Rendered in message compose area
→ Attached to message when sentCircuit Selection
| Parameter | Value | Rationale |
|---|---|---|
| Hop Count | 2 | Balance: privacy vs. latency |
| Entry Selection | From guard set | Reduce entry diversity exposure |
| Exit Selection | Random from pool | Geographic diversity |
| Circuit Lifetime | Single request | Maximum unlinkability |
| Parallel Circuits | 3 pre-built | Low-latency preview generation |
Request Timing
| Phase | Typical Duration | Maximum |
|---|---|---|
| Circuit selection | 10ms | 50ms |
| Onion encryption | 5ms | 20ms |
| Relay routing | 100-300ms | 2s |
| Target fetch | 200-500ms | 5s |
| Content parsing | 50ms | 200ms |
| Total preview time | 400-900ms | 8s |
Preview Data Extraction
Metadata Sources
Preview data is extracted from target pages in priority order:
| Priority | Source | Fields Extracted |
|---|---|---|
| 1 | Open Graph tags | og:title, og:description, og:image |
| 2 | Twitter Card tags | twitter:title, twitter:description, twitter:image |
| 3 | HTML meta tags | title, description |
| 4 | Structured data | JSON-LD, Schema.org |
| 5 | Page content | First heading, first paragraph |
Extracted Fields
| Field | Source Priority | Max Length | Fallback |
|---|---|---|---|
| Title | og:title → twitter:title → title tag → h1 | 200 chars | Domain name |
| Description | og:description → meta description → first p | 500 chars | None |
| Image URL | og:image → twitter:image → first img | N/A | None |
| Site Name | og:site_name → domain | 100 chars | Domain |
| Type | og:type → inferred | 50 chars | ”website” |
| Favicon | link rel=“icon” → /favicon.ico | N/A | None |
Open Graph Extraction
Metadata Extraction Process:
1. PARSE HTML
document = parse_html(response_body)
2. EXTRACT OPEN GRAPH
og_tags = document.query_all('meta[property^="og:"]')
FOR EACH tag IN og_tags:
key = tag.property.replace("og:", "")
value = tag.content
metadata[key] = sanitize(value)
3. EXTRACT TWITTER CARDS
twitter_tags = document.query_all('meta[name^="twitter:"]')
FOR EACH tag IN twitter_tags:
key = tag.name.replace("twitter:", "")
IF key NOT IN metadata:
metadata[key] = sanitize(tag.content)
4. FALLBACK TO HTML
IF "title" NOT IN metadata:
metadata["title"] = document.query('title').text
IF "description" NOT IN metadata:
meta_desc = document.query('meta[name="description"]')
metadata["description"] = meta_desc.content
5. TRUNCATE AND SANITIZE
metadata["title"] = truncate(metadata["title"], 200)
metadata["description"] = truncate(metadata["description"], 500)Preview Content Limits
Size Constraints
| Content | Limit | Rationale |
|---|---|---|
| HTML fetch | 512 KB | Sufficient for metadata extraction |
| Image fetch | 2 MB | Reasonable thumbnail source |
| Generated thumbnail | 100 KB | Bandwidth efficiency |
| Total preview payload | 150 KB | Message size limits |
| Fetch timeout | 5 seconds | User experience |
Thumbnail Processing
| Parameter | Value |
|---|---|
| Max source dimensions | 4096 x 4096 px |
| Output dimensions | 400 x 400 px (max) |
| Output format | WebP (JPEG fallback) |
| Quality | 75% |
| Aspect ratio | Preserved |
Thumbnail Generation:
1. FETCH IMAGE
image_data = fetch_with_limit(image_url, max=2MB)
2. VALIDATE IMAGE
IF NOT valid_image_format(image_data):
SKIP thumbnail generation
IF image_dimensions > 4096x4096:
SKIP thumbnail generation
3. RESIZE
thumbnail = resize_image(
image_data,
max_width=400,
max_height=400,
preserve_aspect=true
)
4. ENCODE
output = encode_webp(thumbnail, quality=75)
IF output.size > 100KB:
output = encode_jpeg(thumbnail, quality=60)
5. RETURN
IF output.size ≤ 100KB:
RETURN output
ELSE:
RETURN null // Skip oversized thumbnailsContent Type Restrictions
| Content Type | Allowed | Notes |
|---|---|---|
| text/html | Yes | Primary target |
| application/xhtml+xml | Yes | XML-based HTML |
| image/* | Yes | For thumbnail only |
| application/json | Partial | API responses with metadata |
| text/plain | No | No useful preview data |
| application/pdf | No | Cannot extract safely |
| video/* | No | Thumbnail only via poster |
Caching Strategy
Relay-Side Caching
Exit relays maintain a shared cache to reduce repeated fetches and improve performance:
| Parameter | Value | Rationale |
|---|---|---|
| Cache duration | 1 hour | Balance freshness vs. efficiency |
| Cache key | SHA-256(normalized_url) | No URL stored in plaintext |
| Max cache size | 1 GB per relay | Resource constraints |
| Eviction policy | LRU | Prioritize popular content |
Cache Privacy Properties
| Property | Implementation | Guarantee |
|---|---|---|
| No user correlation | Cache key is URL hash only | Cannot link users to URLs |
| No request logging | Requests not persisted | No audit trail |
| Shared cache | All users benefit equally | No per-user tracking |
| Cache-only serving | Stale cache served if target down | Reduces timing attacks |
Cache Key Generation
Cache Key Derivation:
1. NORMALIZE URL
normalized = url.lower()
normalized = remove_tracking_params(normalized)
normalized = sort_query_params(normalized)
2. GENERATE KEY
cache_key = SHA-256(normalized)
3. LOOKUP
cached_preview = cache.get(cache_key)
IF cached_preview AND NOT expired(cached_preview):
RETURN cached_preview
Tracking Parameters Removed:
- utm_source, utm_medium, utm_campaign
- fbclid, gclid, msclkid
- ref, source, via
- Any parameter matching tracking patternsClient-Side Caching
| Parameter | Value |
|---|---|
| Cache location | Encrypted local storage |
| Cache duration | 24 hours |
| Cache key | SHA-256(url + conversation_id) |
| Encryption | AES-256-GCM with local key |
Security Measures
Content Sanitization
All preview content is sanitized before delivery to clients:
| Threat | Sanitization |
|---|---|
| JavaScript injection | All scripts removed |
| CSS attacks | Stylesheets stripped |
| Event handlers | on* attributes removed |
| External resources | Blocked except thumbnail |
| Meta refresh | Removed |
| Base tag manipulation | Removed |
| Form injection | All forms removed |
Sanitization Rules
Content Sanitization Process:
1. REMOVE DANGEROUS ELEMENTS
dangerous_tags = [
'script', 'style', 'iframe', 'frame',
'object', 'embed', 'applet', 'form',
'input', 'button', 'select', 'textarea'
]
FOR EACH tag IN dangerous_tags:
document.remove_all(tag)
2. REMOVE EVENT HANDLERS
FOR EACH element IN document.all_elements():
FOR EACH attr IN element.attributes:
IF attr.name.starts_with('on'):
element.remove_attribute(attr.name)
3. SANITIZE URLS
FOR EACH attr IN ['href', 'src', 'action']:
FOR EACH element IN document.query_all('[' + attr + ']'):
url = element.get(attr)
IF NOT is_safe_url(url):
element.remove_attribute(attr)
4. EXTRACT TEXT ONLY
// Final preview contains only:
// - Plain text title
// - Plain text description
// - Validated image URL
// No HTML markup in final previewImage Security
| Check | Action | Purpose |
|---|---|---|
| MIME validation | Verify magic bytes match extension | Prevent type confusion |
| Dimension limits | Reject images > 4096px | Prevent DoS |
| File size limits | Reject images > 2MB | Bandwidth protection |
| Format whitelist | Only JPEG, PNG, GIF, WebP | Reduce attack surface |
| Decompression limits | Max 50MB decompressed | Prevent zip bombs |
| Metadata stripping | Remove EXIF, XMP | Privacy protection |
Malicious URL Detection
| Check | Method | Action |
|---|---|---|
| Known malware domains | Blocklist lookup | Reject with warning |
| Phishing detection | URL pattern analysis | Reject with warning |
| Homograph attacks | IDN normalization check | Display punycode |
| IP-based URLs | Detect raw IP targets | Warn user |
| Local network | Block RFC1918, localhost | Prevent SSRF |
| Unusual ports | Block non-80/443 | Reduce attack surface |
SSRF Prevention
Server-Side Request Forgery prevention on relay nodes:
| Control | Implementation |
|---|---|
| DNS rebinding protection | Resolve DNS, validate IP before fetch |
| Private IP blocking | Reject 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 |
| Localhost blocking | Reject 127.0.0.0/8, ::1 |
| Cloud metadata blocking | Reject 169.254.169.254, metadata.* |
| Protocol restriction | HTTPS only (HTTP redirects to HTTPS only) |
| Redirect limits | Maximum 3 redirects |
| Redirect validation | Each redirect target re-validated |
SSRF Prevention Flow:
1. PARSE URL
parsed = parse_url(target_url)
IF parsed.scheme NOT IN ['http', 'https']:
REJECT("Invalid protocol")
2. RESOLVE DNS
ip_addresses = dns_resolve(parsed.host)
3. VALIDATE IPS
FOR EACH ip IN ip_addresses:
IF is_private_ip(ip):
REJECT("Private IP not allowed")
IF is_loopback(ip):
REJECT("Loopback not allowed")
IF is_cloud_metadata(ip):
REJECT("Metadata endpoint not allowed")
4. FETCH WITH VALIDATED IP
connection = connect_to_ip(ip_addresses[0], parsed.port)
// Use original hostname for TLS SNI and Host header
5. FOLLOW REDIRECTS (limited)
redirect_count = 0
WHILE response.is_redirect AND redirect_count < 3:
new_url = response.headers['Location']
VALIDATE new_url (repeat steps 1-4)
redirect_count += 1User Controls
Preview Settings
| Setting | Options | Default |
|---|---|---|
| Enable previews | On / Off | On |
| Auto-generate | Always / Ask / Never | Always |
| Preview in compose | Show / Hide | Show |
| Download images | Auto / Ask / Never | Auto |
| Send previews | Include / Exclude | Include |
Per-Conversation Settings
| Setting | Scope | Options |
|---|---|---|
| Disable previews | Single conversation | On / Off |
| Preview image quality | Single conversation | High / Low / None |
| Auto-expand previews | Single conversation | Yes / No |
Preview Before Sending
Preview Confirmation Flow:
1. USER PASTES URL
→ Preview generated in background
2. PREVIEW DISPLAYED IN COMPOSE
┌────────────────────────────────┐
│ [Preview Image] │
│ Title of the Page │
│ Description excerpt... │
│ example.com │
│ │
│ [Include Preview] [Remove] │
└────────────────────────────────┘
3. USER CHOOSES
→ "Include Preview": Attach preview to message
→ "Remove": Send message without preview
4. RECIPIENT OPTIONS
→ Preview shown inline
→ Click to open URL (with warning)Security Warnings
| Condition | Warning Shown |
|---|---|
| HTTP (not HTTPS) URL | ”This link uses an insecure connection” |
| Recently registered domain | ”This domain was recently created” |
| IDN/Punycode domain | ”This link contains special characters” |
| Mismatch: preview vs URL | ”The preview may not match the destination” |
| Known tracker redirect | ”This link goes through a tracking service” |
Limitations
Technical Limitations
| Limitation | Cause | Impact |
|---|---|---|
| Relay IP blocking | Some sites block datacenter IPs | Preview unavailable |
| JavaScript-rendered content | Content generated client-side | Incomplete preview |
| Authentication-required pages | No login credentials sent | Generic preview only |
| Rate-limited APIs | Target server throttling | Preview may fail |
| Geo-restricted content | Relay location mismatch | Different or no preview |
| Dynamic content | Content changes after fetch | Preview may be stale |
Content That Cannot Be Previewed
| Content Type | Reason | User Experience |
|---|---|---|
| Login-required pages | No authentication | ”Preview unavailable” |
| Paywalled articles | Content hidden | Title/domain only |
| Single-page apps | JavaScript required | May show loading state |
| PDF documents | Cannot extract safely | File type indicator only |
| Private/internal URLs | SSRF protection | Blocked |
| Tor .onion sites | Not supported | Link shown without preview |
Staleness Considerations
| Scenario | Preview Behavior |
|---|---|
| Content updated after preview | Shows cached version |
| URL redirects changed | Original preview persists |
| Page removed (404) | Cached preview may still show |
| A/B tested pages | Preview may differ from actual |
Staleness Mitigation:
1. CACHE HEADERS
Respect Cache-Control from origin
max-age used when present
2. FRESHNESS INDICATORS
Show "Preview from [time]" if > 1 hour old
3. REFRESH OPTION
User can manually refresh preview
Bypasses cache, fetches fresh content
4. RECIPIENT FETCH
Recipients can optionally re-fetch
Useful for time-sensitive contentError Handling
Error Types and Responses
| Error | Cause | User Message |
|---|---|---|
| TIMEOUT | Target server slow | ”Preview timed out” |
| BLOCKED | Relay IP blocked | ”Preview unavailable for this site” |
| NOT_FOUND | URL returns 404 | ”Page not found” |
| SSL_ERROR | Certificate issues | ”Secure connection failed” |
| CONTENT_TOO_LARGE | Exceeds limits | ”Page too large to preview” |
| INVALID_CONTENT | No extractable metadata | ”No preview available” |
| SSRF_BLOCKED | Security restriction | ”URL not allowed” |
Graceful Degradation
Fallback Hierarchy:
1. FULL PREVIEW
Title + Description + Image
↓ (if image fetch fails)
2. TEXT PREVIEW
Title + Description only
↓ (if metadata extraction fails)
3. MINIMAL PREVIEW
Domain name + favicon
↓ (if everything fails)
4. LINK ONLY
Plain URL displayed
No preview attachmentWire Format
Preview Request Message
| Field | Size | Description |
|---|---|---|
| Version | 1 byte | Protocol version (0x01) |
| Request ID | 16 bytes | Random identifier |
| URL Length | 2 bytes | Length of URL string |
| URL | Variable | Target URL (UTF-8) |
| Options | 1 byte | Bit flags for request options |
Preview Response Message
| Field | Size | Description |
|---|---|---|
| Version | 1 byte | Protocol version (0x01) |
| Request ID | 16 bytes | Matches request |
| Status | 1 byte | Success/error code |
| Title Length | 2 bytes | Length of title |
| Title | Variable | Page title (UTF-8) |
| Description Length | 2 bytes | Length of description |
| Description | Variable | Page description (UTF-8) |
| Site Name Length | 1 byte | Length of site name |
| Site Name | Variable | Site name (UTF-8) |
| Image Present | 1 byte | 0x00 or 0x01 |
| Image Data Length | 4 bytes | If present, image size |
| Image Data | Variable | WebP/JPEG thumbnail |
| Favicon Present | 1 byte | 0x00 or 0x01 |
| Favicon Data | Variable | ICO/PNG favicon |
Message Attachment Format
When a preview is included with a message:
Preview Attachment Structure:
┌─────────────────────────────────────────┐
│ Attachment Type (1 byte): LINK_PREVIEW │
├─────────────────────────────────────────┤
│ Original URL (variable) │
├─────────────────────────────────────────┤
│ Title (variable) │
├─────────────────────────────────────────┤
│ Description (variable) │
├─────────────────────────────────────────┤
│ Site Name (variable) │
├─────────────────────────────────────────┤
│ Thumbnail Key (32 bytes) │
├─────────────────────────────────────────┤
│ Thumbnail Ref (32 bytes, hash) │
├─────────────────────────────────────────┤
│ Fetch Timestamp (8 bytes) │
└─────────────────────────────────────────┘
// Thumbnail stored separately in mesh
// Same encryption as media thumbnailsPerformance Metrics
Typical Performance
| Metric | P50 | P95 | P99 |
|---|---|---|---|
| Preview generation | 450ms | 1.2s | 3s |
| Relay latency | 150ms | 400ms | 800ms |
| Cache hit rate | 35% | - | - |
| Success rate | 92% | - | - |
Optimization Techniques
| Technique | Benefit |
|---|---|
| Pre-built circuits | Reduces initial latency by ~200ms |
| Parallel metadata + image fetch | Reduces total time |
| Aggressive caching | Cache hits return in ~50ms |
| Predictive pre-fetch | Start fetch on URL detection |
| Circuit reuse for same domain | Reduces overhead |
Related Documentation
- Onion Routing - Relay network architecture
- Media Encryption - Thumbnail encryption details
- Privacy Features - Overall privacy design
- Threat Model - Security analysis
- Architecture - System components