Skip to Content
DHT & Kademlia

Distributed Hash Table (DHT) and Kademlia

Deep technical documentation of the DHT implementation powering Zentalk’s decentralized infrastructure.


Why DHT? The Problem with Centralized Lookup

In a centralized system, finding resources is simple: ask the server. But this creates critical problems:

ProblemCentralized SystemDecentralized DHT
Single point of failureServer down = system downNo single point of failure
CensorshipEasy to block/censorExtremely difficult to censor
PrivacyServer sees all queriesQueries distributed across network
ScalabilityServer bottleneckScales with network size
TrustMust trust server operatorTrustless operation

The core question DHT answers: How can millions of nodes collectively store and retrieve data without any central coordinator?

Key-Value Storage Across Nodes

A DHT distributes a key-value store across all participating nodes:

Traditional Database: Client → Server → Database[key] → value DHT: Client → Node_1 → Node_7 → Node_42 → value (each hop gets closer to the key's location)

Properties:

  • Deterministic: Given a key, any node can find who stores it
  • Load-balanced: Data distributed evenly across nodes
  • Fault-tolerant: Node failures don’t lose data (replication)
  • Scalable: Lookup is O(log n) where n = number of nodes

Kademlia Protocol

Kademlia is the DHT protocol used by Zentalk (also used by BitTorrent, IPFS, and Ethereum). It was designed by Petar Maymounkov and David Mazieres in 2002.

Core Concepts

ConceptDescription
Node ID256-bit identifier for each node
Key256-bit identifier for stored data
XOR DistanceDistance metric between IDs
k-bucketList of known nodes at specific distance ranges
Routing TableCollection of k-buckets

XOR Distance Metric

Why XOR?

Kademlia’s key innovation is using XOR (exclusive or) as a distance metric. This isn’t arbitrary - XOR has mathematical properties that make DHT operations efficient.

XOR Distance Formula:

distance(A, B) = A XOR B

Example:

Node A: 1010 1100 Node B: 1010 0011 ───────── XOR: 0000 1111 = distance 15 Node A: 1010 1100 Node C: 1010 1101 ───────── XOR: 0000 0001 = distance 1 Node C is much closer to A than B

Why XOR Works

XOR satisfies the mathematical requirements for a distance metric:

PropertyFormulaWhy It Matters
Identityd(x, x) = 0A node is zero distance from itself
Symmetryd(x, y) = d(y, x)If A is close to B, B is close to A
Triangle Inequalityd(x, z) ≤ d(x, y) + d(y, z)No shortcuts - routing always converges
UnidirectionalFor any x and d, exactly one y where d(x,y) = dEach distance maps to exactly one node

The unidirectional property is crucial: For any node and any target distance, there’s exactly ONE other node at that distance. This prevents routing ambiguity.

XOR Distance Interpretation

The XOR distance has a useful interpretation: the position of the highest differing bit.

If A XOR B has its highest set bit at position i: - A and B share their first (256-i) bits - They diverge starting at bit i

Example:

A: 1010 1100 0000 0000 ... B: 1010 1000 0000 0000 ... ^ First differing bit at position 5 XOR = 0000 0100 ... = 4 (2^2, but position 5 from left in 8-bit view)

This bit-level interpretation enables the binary tree structure.


Binary Tree Structure

Conceptual Model

Kademlia conceptualizes the ID space as a binary tree where:

  • Each node is a leaf
  • Node ID determines the path from root to leaf
  • Each bit in the ID represents a left (0) or right (1) branch
Root / \ 0 1 / \ / \ 00 01 10 11 /\ /\ /\ /\ ... ... ... ... Node 1010... follows path: right → left → right → left → ...

Subtree Coverage

For any node, the ID space divides into subtrees:

From Node A's perspective (ID: 0...): Subtree 0: All nodes starting with 1... (half the network) XOR distance: 1xxx... (very far) Subtree 1: All nodes starting with 01... (quarter of network) XOR distance: 01xx... (far) Subtree 2: All nodes starting with 001... (eighth of network) XOR distance: 001x... (medium) ...and so on

Key insight: Each subtree is exponentially smaller, but A needs to know at least one node from each subtree to route to any destination.


k-Buckets: The Routing Table

What Are k-Buckets?

A k-bucket is a list of up to k contacts (nodes) for a specific distance range. Zentalk uses k=20.

Each node maintains 256 k-buckets (one for each bit position):

BucketDistance RangeContains Nodes Where
02^0 to 2^1-1First 255 bits match, last bit differs
12^1 to 2^2-1First 254 bits match, bit 254 differs
22^2 to 2^3-1First 253 bits match, bit 253 differs
2552^255 to 2^256-1First bit differs

Why k=20?

The parameter k balances multiple concerns:

FactorSmall kLarge k
Memory usageLowerHigher
Lookup reliabilityLowerHigher
Network overheadLowerHigher
Resistance to churnLowerHigher

Why 20 specifically:

  • Probability of all k nodes failing simultaneously: extremely low
  • Even with 50% node churn, lookup succeeds with high probability
  • Memory overhead is acceptable (256 buckets * 20 nodes * ~50 bytes = ~256KB)

k-Bucket Maintenance

When a node learns about a new contact:

1. Calculate XOR distance to determine target bucket 2. If bucket has < k entries: → Add new contact 3. If bucket is full: → Ping least-recently-seen contact → If no response: replace with new contact → If responds: move to end of list, discard new contact

Why favor old contacts?

  • Long-lived nodes are more likely to stay alive (empirically proven)
  • Attackers can’t easily inject new malicious nodes
  • “Least-recently-seen” eviction provides Sybil resistance

Routing Table Visualization

Node ID: 00101... (256 bits) Bucket 255: [Nodes starting with 1...] ← Far (half the network) Bucket 254: [Nodes starting with 01...] ← Far-ish Bucket 253: [Nodes starting with 001...] ← Medium ... Bucket 2: [Nodes starting with 00100...] ← Close Bucket 1: [Nodes starting with 001010...]← Very close Bucket 0: [Nodes starting with 0010100...]← Closest Each bucket holds up to k=20 nodes

Lookup Algorithm

FIND_NODE Operation

The fundamental operation: find the k closest nodes to a target ID.

Algorithm:

FIND_NODE(target): 1. Initialize: - shortlist = k closest nodes from local routing table - queried = empty set - closest_seen = infinity 2. Loop: - Select alpha (3) nodes from shortlist not yet queried - Send parallel FIND_NODE RPCs to selected nodes - Mark nodes as queried 3. Process responses: - Each response contains k closest nodes the responder knows - Add new nodes to shortlist - Update closest_seen if closer node found 4. Termination: - If no node closer than closest_seen found in a round - Query all remaining nodes in shortlist within k closest - Return k closest nodes seen

Lookup Convergence: Why It Works

Key property: Each hop at least halves the remaining distance.

Start: Node A (ID: 0000...) Target: 1111... Initial distance: 1111... XOR 0000... = 1111... (very large) Hop 1: A asks bucket 255 → gets node B (ID: 1000...) New distance: 1111... XOR 1000... = 0111... (halved!) Hop 2: B asks its bucket → gets node C (ID: 1100...) New distance: 1111... XOR 1100... = 0011... (halved again!) Hop 3: C asks its bucket → gets node D (ID: 1110...) Distance: 0001... Hop 4: D asks its bucket → gets node E (ID: 1111...) Distance: 0000... (found!)

Maximum hops: O(log n) where n = network size

For a million-node network: ~20 hops maximum.

FIND_VALUE Operation

Similar to FIND_NODE, but for retrieving stored values:

FIND_VALUE(key): 1. Compute target = hash(key) 2. Perform iterative lookup (like FIND_NODE) 3. At each hop: - If node has value for key → return value immediately - Otherwise → return k closest nodes (continue lookup) 4. If lookup completes without finding value: - Store value at closest node found (caching) - Return "not found"

Iterative vs Recursive Lookup

ApproachHow It WorksProsCons
IterativeQuerying node controls entire lookupPredictable latency, easier debuggingMore round trips
RecursiveEach node forwards to nextFewer round tripsUnpredictable, harder to secure

Zentalk uses iterative lookups because:

  • Better control over timeout/retry
  • Easier to reason about security
  • Can implement lookup caching at originator

Alpha Parameter: Parallel Queries

The alpha parameter (typically 3) controls lookup parallelism:

Alpha = 1: Sequential lookups A → B → C → D (slow, but minimal bandwidth) Alpha = 3: Parallel lookups A → B ─┐ A → C ─┼→ best response → continue A → D ─┘ (faster, more bandwidth)

Why alpha=3:

  • Tolerates 2 slow/failed nodes per round
  • 3x speedup over sequential in practice
  • Reasonable bandwidth overhead
ParameterZentalk ValuePurpose
k (bucket size)20Redundancy, reliability
alpha (parallelism)3Lookup speed
ID size256 bitsSecurity margin, collision avoidance

Node Operations

JOIN: How a New Node Joins

When a node first starts, it knows nothing about the network except bootstrap nodes:

JOIN(bootstrap_nodes): 1. Generate node ID: - node_id = SHA256(public_key) - This binds ID to cryptographic identity 2. Contact bootstrap node: - Send PING to verify connectivity - Receive initial contacts 3. Self-lookup: - Perform FIND_NODE(own_node_id) - This populates routing table with nearby nodes 4. Bucket refresh: - For each bucket, FIND_NODE(random_id_in_bucket) - Discovers nodes at all distance ranges 5. Announce presence: - Stored data will be replicated to us automatically - Other lookups will discover us

Why self-lookup works:

  • Other nodes learn about us when we query them
  • We learn about nodes close to us (most important)
  • Routing table fills organically

STORE: How Data Is Stored

STORE(key, value): 1. Compute storage location: - target = hash(key) // 256-bit ID 2. Find storage nodes: - nodes = FIND_NODE(target) - Select k closest nodes to target 3. Store value: - Send STORE RPC to selected nodes - Each stores (key, value, timestamp, TTL) 4. Replication: - Value stored on k nodes for redundancy - Nodes periodically replicate to neighbors

Storage responsibility:

  • Nodes whose IDs are closest to the key are responsible
  • On node join/leave, data migrates to maintain this property

Refresh: Keeping Routing Tables Fresh

Routing tables become stale as nodes join/leave:

REFRESH(): Every hour, for each bucket: 1. If no lookup touched this bucket recently: - random_id = generate ID in bucket's range - FIND_NODE(random_id) 2. Ping least-recently-seen nodes: - If no response → remove from bucket - If response → move to end of bucket

Why refresh matters:

  • Detects failed nodes
  • Discovers new nodes
  • Keeps routing table accurate

Republishing Data

Stored data must be republished to survive node churn:

REPUBLISH(): Every hour: 1. For each locally stored (key, value): - If originally_stored_by_us OR near_expiration: - STORE(key, value) to refresh 2. Replicate to new close neighbors: - If new node joined closer to key: - Send them the data
OperationFrequencyPurpose
Bucket refresh1 hourMaintain routing accuracy
Data republish1 hourPrevent data loss
Ping stale contactsOn new contactVerify liveness

Security Considerations

Eclipse Attacks

The Attack: An attacker surrounds a victim node with malicious nodes, controlling all its network view.

Normal: Victim ──► Honest Node A ──► Honest Node B ──► Honest Node C Eclipsed: Victim ──► Attacker Node 1 ──► Attacker controls ──► Attacker Node 2 ──► all information ──► Attacker Node 3 ──► victim receives

Impact:

  • Attacker controls what data victim sees
  • Can partition victim from honest network
  • Can censor specific keys

Mitigations in Zentalk:

MitigationHow It Works
Diverse bootstrapMultiple independent bootstrap nodes
Connection limitsMax connections per IP/subnet
Path diversityMultiple lookup paths for critical data
Bucket constraintsLimit nodes from same IP range per bucket
Long-term contactsPrefer established contacts (harder to eclipse)

Sybil Attacks

The Attack: Attacker creates many fake identities to gain disproportionate network influence.

Honest network: 1000 nodes Attacker adds: 10000 Sybil nodes Result: Attacker controls 90% of routing table entries Attacker likely responsible for most keys

Why Sybil is hard in DHT:

  • Can’t choose arbitrary node IDs (see below)
  • k-bucket design limits exposure
  • Honest nodes seen first are preferred

Mitigations:

MitigationHow It Works
Proof-of-workNode ID generation requires computational cost
Cryptographic bindingnode_id = hash(public_key), can’t choose ID
Rate limitingLimit new contacts per time period
ReputationTrack node behavior over time
Resource testingVerify nodes have real resources (bandwidth, storage)

Node ID Generation

Critical security requirement: Nodes must not be able to choose their ID.

Why this matters:

If attackers could choose IDs: - Position Sybil nodes near target key - Become responsible for any key they want - Intercept/modify stored data

Zentalk’s approach:

node_id = SHA256(ed25519_public_key || creation_timestamp)
PropertyGuarantee
Binding to keyID tied to cryptographic identity
UnpredictableSHA256 output is uniformly random
VerifiableAnyone can verify ID was generated correctly
Non-reusableTimestamp prevents pre-computation

Verification on first contact:

1. Node presents: (node_id, public_key, timestamp, signature) 2. Verifier checks: - node_id == SHA256(public_key || timestamp) - timestamp within acceptable window - signature valid

Additional Security Measures

ThreatMitigation
Data poisoningStore signatures with data, verify on retrieval
Routing table poisoningVerify node IDs before adding to routing table
DDoS via lookupsRate limit lookup requests
Replay attacksInclude timestamps, maintain seen-message cache

Zentalk-Specific Usage

Relay Node Discovery

Zentalk uses DHT to discover relay nodes for 3-hop relay routing:

1. Client needs relay nodes for circuit 2. Query DHT for key: hash("relay:" || region || timestamp_bucket) 3. DHT returns list of available relays 4. Client selects relays for circuit building

Why DHT for relay discovery:

  • No central relay directory
  • Load balancing across regions
  • Censorship resistant
  • Relays can join/leave dynamically
DHT Key PatternReturns
relay:guard:euGuard relays in Europe
relay:middle:*All middle relays
relay:exit:usExit relays in US

Key Bundle Distribution

User public key bundles are stored in DHT:

Store key bundle: key = hash("keybundle:" || user_wallet_address) value = encrypted_and_signed_key_bundle STORE(key, value) Fetch key bundle: key = hash("keybundle:" || recipient_wallet_address) bundle = FIND_VALUE(key) verify_signature(bundle)

Benefits:

  • No central key server
  • Key bundles available even if API offline
  • Redundant storage across k nodes

Mesh Storage Addressing

Decentralized storage uses DHT for content addressing:

Store encrypted media chunk: content_hash = SHA256(encrypted_chunk) storage_key = hash("media:" || content_hash) STORE(storage_key, encrypted_chunk) Retrieve: storage_key = hash("media:" || known_content_hash) chunk = FIND_VALUE(storage_key) verify: SHA256(chunk) == content_hash

Mesh storage modules using DHT:

ModuleDHT Key FormatPurpose
messagesmsg:{recipient}:{timestamp_bucket}Store-and-forward messages
mediamedia:{content_hash}Chunked encrypted media
keyskeybundle:{wallet_address}Public key distribution
profileprofile:{wallet_address}Encrypted profile sync

DHT Parameters in Zentalk

ParameterValueRationale
k (bucket size)20Balance reliability vs overhead
alpha (parallel queries)3Fast lookups, fault tolerance
ID size256 bitsMatch SHA256, security margin
Refresh interval1 hourKeep routing tables fresh
Republish interval1 hourMaintain data availability
Data TTL24-72 hoursConfigurable per data type
Bootstrap nodes5+Geographic distribution

Performance Characteristics

Lookup Complexity

MetricValueNotes
Lookup hopsO(log n)~20 hops for 1M nodes
Messages per lookupO(k * log n)With alpha parallelism
Routing table sizeO(k * log n)~5000 entries for 1M network
Storage overheadO(k) per keyReplication factor

Latency Expectations

Lookup latency = hops * avg_rtt + processing_time For 1M node network: - Hops: ~20 - Avg RTT: 50ms (varies by geography) - Total: ~1 second typical With alpha=3 parallelism: - Effective latency: ~300-500ms typical

Last updated on