Skip to Content
Link Preview Privacy

Link Preview Privacy

Technical specification for privacy-preserving link previews in Zentalk.

Overview

Link previews enhance user experience by displaying website metadata (title, description, thumbnail) inline with messages. However, traditional implementations create significant privacy risks. Zentalk implements a relay-based architecture that prevents the target server from learning anything about the user requesting the preview.


The Privacy Problem

When a messaging application fetches a link preview directly from the target URL, it exposes sensitive information:

RiskDescriptionPrivacy Impact
IP Address ExposureUser’s real IP sent to target serverLocation tracking, identity correlation
Referer Header LeakageMessenger identified in HTTP headersUsage pattern disclosure
Timing CorrelationRequest timing reveals message activityBehavioral analysis
DNS LeakageDNS queries reveal browsing intentISP surveillance, network monitoring
TLS FingerprintingClient characteristics exposedDevice identification

Attack Vectors in Traditional Implementations

AttackMechanismConsequence
Tracking Pixel InjectionUnique URL per recipientIdentify who viewed preview
IP HarvestingLog requests to shared linksMap user IP addresses
Timing AnalysisCorrelate preview fetch with message sendDeanonymize senders
Referer MiningExtract messenger identity from headersProfile user’s app usage
Request FingerprintingAnalyze TLS/HTTP characteristicsIdentify device types

Correlation Attack Example

Traditional Preview Flow (INSECURE): 1. Alice sends link to Bob in Zentalk 2. Alice's client fetches preview from target.com → Target sees: IP=Alice, Referer=zentalk-client 3. Bob's client fetches preview from target.com → Target sees: IP=Bob, Referer=zentalk-client 4. Target can correlate: → Two Zentalk users accessed same URL → Timing suggests communication between them → IPs reveal approximate locations

Zentalk’s Solution

Relay-Based Architecture

Zentalk routes all preview requests through a distributed relay network, ensuring no direct connection between user clients and target URLs.

ComponentRoleKnowledge
User ClientRequests preview via relayKnows target URL
Entry RelayFirst hop in relay chainKnows client IP, not target URL
Exit RelayFetches from target URLKnows target URL, not client IP
Target ServerServes preview contentSees only relay IP

Privacy Guarantees

PropertyMechanismGuarantee
IP AnonymityMulti-hop relay routingTarget never sees client IP
Referer ProtectionRelay strips/replaces headersNo messenger identification
Timing ObfuscationBatched requests, random delaysCorrelation resistance
DNS PrivacyRelay performs DNS resolutionClient DNS queries hidden
Request UnlinkabilityPer-request circuit rotationNo persistent fingerprint

Comparison with Traditional Approaches

ApproachIP HiddenReferer HiddenTiming ProtectedDecentralized
Direct FetchNoNoNoN/A
Single ProxyYesYesNoNo
VPN-BasedYesPartialNoNo
Tor-Style (Zentalk)YesYesPartialYes

Preview Generation Flow

End-to-End Process

Privacy-Preserving Preview Flow: 1. USER PASTES LINK → Client detects URL pattern in message input → Preview request initiated (if enabled) 2. BUILD RELAY CIRCUIT → Select 2-hop circuit from relay pool → Entry relay: Knows client, not destination → Exit relay: Knows destination, not client 3. ENCRYPT REQUEST → Construct preview request → Encrypt for exit relay (inner layer) → Encrypt for entry relay (outer layer) 4. ROUTE THROUGH RELAYS → Client → Entry Relay (onion layer 1) → Entry Relay → Exit Relay (onion layer 2) → Exit Relay → Target URL (plaintext HTTPS) 5. FETCH AND SANITIZE → Exit relay fetches target URL → Content sanitized (scripts removed) → Metadata extracted (title, description, image) 6. RETURN ENCRYPTED RESPONSE → Exit relay encrypts response for client → Routed back through entry relay → Client decrypts preview data 7. CACHE AND DISPLAY → Preview cached locally (encrypted) → Rendered in message compose area → Attached to message when sent

Circuit Selection

ParameterValueRationale
Hop Count2Balance: privacy vs. latency
Entry SelectionFrom guard setReduce entry diversity exposure
Exit SelectionRandom from poolGeographic diversity
Circuit LifetimeSingle requestMaximum unlinkability
Parallel Circuits3 pre-builtLow-latency preview generation

Request Timing

PhaseTypical DurationMaximum
Circuit selection10ms50ms
Onion encryption5ms20ms
Relay routing100-300ms2s
Target fetch200-500ms5s
Content parsing50ms200ms
Total preview time400-900ms8s

Preview Data Extraction

Metadata Sources

Preview data is extracted from target pages in priority order:

PrioritySourceFields Extracted
1Open Graph tagsog:title, og:description, og:image
2Twitter Card tagstwitter:title, twitter:description, twitter:image
3HTML meta tagstitle, description
4Structured dataJSON-LD, Schema.org
5Page contentFirst heading, first paragraph

Extracted Fields

FieldSource PriorityMax LengthFallback
Titleog:title → twitter:title → title tag → h1200 charsDomain name
Descriptionog:description → meta description → first p500 charsNone
Image URLog:image → twitter:image → first imgN/ANone
Site Nameog:site_name → domain100 charsDomain
Typeog:type → inferred50 chars”website”
Faviconlink rel=“icon” → /favicon.icoN/ANone

Open Graph Extraction

Metadata Extraction Process: 1. PARSE HTML document = parse_html(response_body) 2. EXTRACT OPEN GRAPH og_tags = document.query_all('meta[property^="og:"]') FOR EACH tag IN og_tags: key = tag.property.replace("og:", "") value = tag.content metadata[key] = sanitize(value) 3. EXTRACT TWITTER CARDS twitter_tags = document.query_all('meta[name^="twitter:"]') FOR EACH tag IN twitter_tags: key = tag.name.replace("twitter:", "") IF key NOT IN metadata: metadata[key] = sanitize(tag.content) 4. FALLBACK TO HTML IF "title" NOT IN metadata: metadata["title"] = document.query('title').text IF "description" NOT IN metadata: meta_desc = document.query('meta[name="description"]') metadata["description"] = meta_desc.content 5. TRUNCATE AND SANITIZE metadata["title"] = truncate(metadata["title"], 200) metadata["description"] = truncate(metadata["description"], 500)

Preview Content Limits

Size Constraints

ContentLimitRationale
HTML fetch512 KBSufficient for metadata extraction
Image fetch2 MBReasonable thumbnail source
Generated thumbnail100 KBBandwidth efficiency
Total preview payload150 KBMessage size limits
Fetch timeout5 secondsUser experience

Thumbnail Processing

ParameterValue
Max source dimensions4096 x 4096 px
Output dimensions400 x 400 px (max)
Output formatWebP (JPEG fallback)
Quality75%
Aspect ratioPreserved
Thumbnail Generation: 1. FETCH IMAGE image_data = fetch_with_limit(image_url, max=2MB) 2. VALIDATE IMAGE IF NOT valid_image_format(image_data): SKIP thumbnail generation IF image_dimensions > 4096x4096: SKIP thumbnail generation 3. RESIZE thumbnail = resize_image( image_data, max_width=400, max_height=400, preserve_aspect=true ) 4. ENCODE output = encode_webp(thumbnail, quality=75) IF output.size > 100KB: output = encode_jpeg(thumbnail, quality=60) 5. RETURN IF output.size ≤ 100KB: RETURN output ELSE: RETURN null // Skip oversized thumbnails

Content Type Restrictions

Content TypeAllowedNotes
text/htmlYesPrimary target
application/xhtml+xmlYesXML-based HTML
image/*YesFor thumbnail only
application/jsonPartialAPI responses with metadata
text/plainNoNo useful preview data
application/pdfNoCannot extract safely
video/*NoThumbnail only via poster

Caching Strategy

Relay-Side Caching

Exit relays maintain a shared cache to reduce repeated fetches and improve performance:

ParameterValueRationale
Cache duration1 hourBalance freshness vs. efficiency
Cache keySHA-256(normalized_url)No URL stored in plaintext
Max cache size1 GB per relayResource constraints
Eviction policyLRUPrioritize popular content

Cache Privacy Properties

PropertyImplementationGuarantee
No user correlationCache key is URL hash onlyCannot link users to URLs
No request loggingRequests not persistedNo audit trail
Shared cacheAll users benefit equallyNo per-user tracking
Cache-only servingStale cache served if target downReduces timing attacks

Cache Key Generation

Cache Key Derivation: 1. NORMALIZE URL normalized = url.lower() normalized = remove_tracking_params(normalized) normalized = sort_query_params(normalized) 2. GENERATE KEY cache_key = SHA-256(normalized) 3. LOOKUP cached_preview = cache.get(cache_key) IF cached_preview AND NOT expired(cached_preview): RETURN cached_preview Tracking Parameters Removed: - utm_source, utm_medium, utm_campaign - fbclid, gclid, msclkid - ref, source, via - Any parameter matching tracking patterns

Client-Side Caching

ParameterValue
Cache locationEncrypted local storage
Cache duration24 hours
Cache keySHA-256(url + conversation_id)
EncryptionAES-256-GCM with local key

Security Measures

Content Sanitization

All preview content is sanitized before delivery to clients:

ThreatSanitization
JavaScript injectionAll scripts removed
CSS attacksStylesheets stripped
Event handlerson* attributes removed
External resourcesBlocked except thumbnail
Meta refreshRemoved
Base tag manipulationRemoved
Form injectionAll forms removed

Sanitization Rules

Content Sanitization Process: 1. REMOVE DANGEROUS ELEMENTS dangerous_tags = [ 'script', 'style', 'iframe', 'frame', 'object', 'embed', 'applet', 'form', 'input', 'button', 'select', 'textarea' ] FOR EACH tag IN dangerous_tags: document.remove_all(tag) 2. REMOVE EVENT HANDLERS FOR EACH element IN document.all_elements(): FOR EACH attr IN element.attributes: IF attr.name.starts_with('on'): element.remove_attribute(attr.name) 3. SANITIZE URLS FOR EACH attr IN ['href', 'src', 'action']: FOR EACH element IN document.query_all('[' + attr + ']'): url = element.get(attr) IF NOT is_safe_url(url): element.remove_attribute(attr) 4. EXTRACT TEXT ONLY // Final preview contains only: // - Plain text title // - Plain text description // - Validated image URL // No HTML markup in final preview

Image Security

CheckActionPurpose
MIME validationVerify magic bytes match extensionPrevent type confusion
Dimension limitsReject images > 4096pxPrevent DoS
File size limitsReject images > 2MBBandwidth protection
Format whitelistOnly JPEG, PNG, GIF, WebPReduce attack surface
Decompression limitsMax 50MB decompressedPrevent zip bombs
Metadata strippingRemove EXIF, XMPPrivacy protection

Malicious URL Detection

CheckMethodAction
Known malware domainsBlocklist lookupReject with warning
Phishing detectionURL pattern analysisReject with warning
Homograph attacksIDN normalization checkDisplay punycode
IP-based URLsDetect raw IP targetsWarn user
Local networkBlock RFC1918, localhostPrevent SSRF
Unusual portsBlock non-80/443Reduce attack surface

SSRF Prevention

Server-Side Request Forgery prevention on relay nodes:

ControlImplementation
DNS rebinding protectionResolve DNS, validate IP before fetch
Private IP blockingReject 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
Localhost blockingReject 127.0.0.0/8, ::1
Cloud metadata blockingReject 169.254.169.254, metadata.*
Protocol restrictionHTTPS only (HTTP redirects to HTTPS only)
Redirect limitsMaximum 3 redirects
Redirect validationEach redirect target re-validated
SSRF Prevention Flow: 1. PARSE URL parsed = parse_url(target_url) IF parsed.scheme NOT IN ['http', 'https']: REJECT("Invalid protocol") 2. RESOLVE DNS ip_addresses = dns_resolve(parsed.host) 3. VALIDATE IPS FOR EACH ip IN ip_addresses: IF is_private_ip(ip): REJECT("Private IP not allowed") IF is_loopback(ip): REJECT("Loopback not allowed") IF is_cloud_metadata(ip): REJECT("Metadata endpoint not allowed") 4. FETCH WITH VALIDATED IP connection = connect_to_ip(ip_addresses[0], parsed.port) // Use original hostname for TLS SNI and Host header 5. FOLLOW REDIRECTS (limited) redirect_count = 0 WHILE response.is_redirect AND redirect_count < 3: new_url = response.headers['Location'] VALIDATE new_url (repeat steps 1-4) redirect_count += 1

User Controls

Preview Settings

SettingOptionsDefault
Enable previewsOn / OffOn
Auto-generateAlways / Ask / NeverAlways
Preview in composeShow / HideShow
Download imagesAuto / Ask / NeverAuto
Send previewsInclude / ExcludeInclude

Per-Conversation Settings

SettingScopeOptions
Disable previewsSingle conversationOn / Off
Preview image qualitySingle conversationHigh / Low / None
Auto-expand previewsSingle conversationYes / No

Preview Before Sending

Preview Confirmation Flow: 1. USER PASTES URL → Preview generated in background 2. PREVIEW DISPLAYED IN COMPOSE ┌────────────────────────────────┐ │ [Preview Image] │ │ Title of the Page │ │ Description excerpt... │ │ example.com │ │ │ │ [Include Preview] [Remove] │ └────────────────────────────────┘ 3. USER CHOOSES → "Include Preview": Attach preview to message → "Remove": Send message without preview 4. RECIPIENT OPTIONS → Preview shown inline → Click to open URL (with warning)

Security Warnings

ConditionWarning Shown
HTTP (not HTTPS) URL”This link uses an insecure connection”
Recently registered domain”This domain was recently created”
IDN/Punycode domain”This link contains special characters”
Mismatch: preview vs URL”The preview may not match the destination”
Known tracker redirect”This link goes through a tracking service”

Limitations

Technical Limitations

LimitationCauseImpact
Relay IP blockingSome sites block datacenter IPsPreview unavailable
JavaScript-rendered contentContent generated client-sideIncomplete preview
Authentication-required pagesNo login credentials sentGeneric preview only
Rate-limited APIsTarget server throttlingPreview may fail
Geo-restricted contentRelay location mismatchDifferent or no preview
Dynamic contentContent changes after fetchPreview may be stale

Content That Cannot Be Previewed

Content TypeReasonUser Experience
Login-required pagesNo authentication”Preview unavailable”
Paywalled articlesContent hiddenTitle/domain only
Single-page appsJavaScript requiredMay show loading state
PDF documentsCannot extract safelyFile type indicator only
Private/internal URLsSSRF protectionBlocked
Tor .onion sitesNot supportedLink shown without preview

Staleness Considerations

ScenarioPreview Behavior
Content updated after previewShows cached version
URL redirects changedOriginal preview persists
Page removed (404)Cached preview may still show
A/B tested pagesPreview may differ from actual
Staleness Mitigation: 1. CACHE HEADERS Respect Cache-Control from origin max-age used when present 2. FRESHNESS INDICATORS Show "Preview from [time]" if > 1 hour old 3. REFRESH OPTION User can manually refresh preview Bypasses cache, fetches fresh content 4. RECIPIENT FETCH Recipients can optionally re-fetch Useful for time-sensitive content

Error Handling

Error Types and Responses

ErrorCauseUser Message
TIMEOUTTarget server slow”Preview timed out”
BLOCKEDRelay IP blocked”Preview unavailable for this site”
NOT_FOUNDURL returns 404”Page not found”
SSL_ERRORCertificate issues”Secure connection failed”
CONTENT_TOO_LARGEExceeds limits”Page too large to preview”
INVALID_CONTENTNo extractable metadata”No preview available”
SSRF_BLOCKEDSecurity restriction”URL not allowed”

Graceful Degradation

Fallback Hierarchy: 1. FULL PREVIEW Title + Description + Image ↓ (if image fetch fails) 2. TEXT PREVIEW Title + Description only ↓ (if metadata extraction fails) 3. MINIMAL PREVIEW Domain name + favicon ↓ (if everything fails) 4. LINK ONLY Plain URL displayed No preview attachment

Wire Format

Preview Request Message

FieldSizeDescription
Version1 byteProtocol version (0x01)
Request ID16 bytesRandom identifier
URL Length2 bytesLength of URL string
URLVariableTarget URL (UTF-8)
Options1 byteBit flags for request options

Preview Response Message

FieldSizeDescription
Version1 byteProtocol version (0x01)
Request ID16 bytesMatches request
Status1 byteSuccess/error code
Title Length2 bytesLength of title
TitleVariablePage title (UTF-8)
Description Length2 bytesLength of description
DescriptionVariablePage description (UTF-8)
Site Name Length1 byteLength of site name
Site NameVariableSite name (UTF-8)
Image Present1 byte0x00 or 0x01
Image Data Length4 bytesIf present, image size
Image DataVariableWebP/JPEG thumbnail
Favicon Present1 byte0x00 or 0x01
Favicon DataVariableICO/PNG favicon

Message Attachment Format

When a preview is included with a message:

Preview Attachment Structure: ┌─────────────────────────────────────────┐ │ Attachment Type (1 byte): LINK_PREVIEW │ ├─────────────────────────────────────────┤ │ Original URL (variable) │ ├─────────────────────────────────────────┤ │ Title (variable) │ ├─────────────────────────────────────────┤ │ Description (variable) │ ├─────────────────────────────────────────┤ │ Site Name (variable) │ ├─────────────────────────────────────────┤ │ Thumbnail Key (32 bytes) │ ├─────────────────────────────────────────┤ │ Thumbnail Ref (32 bytes, hash) │ ├─────────────────────────────────────────┤ │ Fetch Timestamp (8 bytes) │ └─────────────────────────────────────────┘ // Thumbnail stored separately in mesh // Same encryption as media thumbnails

Performance Metrics

Typical Performance

MetricP50P95P99
Preview generation450ms1.2s3s
Relay latency150ms400ms800ms
Cache hit rate35%--
Success rate92%--

Optimization Techniques

TechniqueBenefit
Pre-built circuitsReduces initial latency by ~200ms
Parallel metadata + image fetchReduces total time
Aggressive cachingCache hits return in ~50ms
Predictive pre-fetchStart fetch on URL detection
Circuit reuse for same domainReduces overhead

Last updated on