Distributed Hash Table (DHT) and Kademlia
Deep technical documentation of the DHT implementation powering Zentalk’s decentralized infrastructure.
Why DHT? The Problem with Centralized Lookup
In a centralized system, finding resources is simple: ask the server. But this creates critical problems:
| Problem | Centralized System | Decentralized DHT |
|---|---|---|
| Single point of failure | Server down = system down | No single point of failure |
| Censorship | Easy to block/censor | Extremely difficult to censor |
| Privacy | Server sees all queries | Queries distributed across network |
| Scalability | Server bottleneck | Scales with network size |
| Trust | Must trust server operator | Trustless operation |
The core question DHT answers: How can millions of nodes collectively store and retrieve data without any central coordinator?
Key-Value Storage Across Nodes
A DHT distributes a key-value store across all participating nodes:
Traditional Database:
Client → Server → Database[key] → value
DHT:
Client → Node_1 → Node_7 → Node_42 → value
(each hop gets closer to the key's location)Properties:
- Deterministic: Given a key, any node can find who stores it
- Load-balanced: Data distributed evenly across nodes
- Fault-tolerant: Node failures don’t lose data (replication)
- Scalable: Lookup is O(log n) where n = number of nodes
Kademlia Protocol
Kademlia is the DHT protocol used by Zentalk (also used by BitTorrent, IPFS, and Ethereum). It was designed by Petar Maymounkov and David Mazieres in 2002.
Core Concepts
| Concept | Description |
|---|---|
| Node ID | 256-bit identifier for each node |
| Key | 256-bit identifier for stored data |
| XOR Distance | Distance metric between IDs |
| k-bucket | List of known nodes at specific distance ranges |
| Routing Table | Collection of k-buckets |
XOR Distance Metric
Why XOR?
Kademlia’s key innovation is using XOR (exclusive or) as a distance metric. This isn’t arbitrary - XOR has mathematical properties that make DHT operations efficient.
XOR Distance Formula:
distance(A, B) = A XOR BExample:
Node A: 1010 1100
Node B: 1010 0011
─────────
XOR: 0000 1111 = distance 15
Node A: 1010 1100
Node C: 1010 1101
─────────
XOR: 0000 0001 = distance 1
Node C is much closer to A than BWhy XOR Works
XOR satisfies the mathematical requirements for a distance metric:
| Property | Formula | Why It Matters |
|---|---|---|
| Identity | d(x, x) = 0 | A node is zero distance from itself |
| Symmetry | d(x, y) = d(y, x) | If A is close to B, B is close to A |
| Triangle Inequality | d(x, z) ≤ d(x, y) + d(y, z) | No shortcuts - routing always converges |
| Unidirectional | For any x and d, exactly one y where d(x,y) = d | Each distance maps to exactly one node |
The unidirectional property is crucial: For any node and any target distance, there’s exactly ONE other node at that distance. This prevents routing ambiguity.
XOR Distance Interpretation
The XOR distance has a useful interpretation: the position of the highest differing bit.
If A XOR B has its highest set bit at position i:
- A and B share their first (256-i) bits
- They diverge starting at bit iExample:
A: 1010 1100 0000 0000 ...
B: 1010 1000 0000 0000 ...
^
First differing bit at position 5
XOR = 0000 0100 ... = 4 (2^2, but position 5 from left in 8-bit view)This bit-level interpretation enables the binary tree structure.
Binary Tree Structure
Conceptual Model
Kademlia conceptualizes the ID space as a binary tree where:
- Each node is a leaf
- Node ID determines the path from root to leaf
- Each bit in the ID represents a left (0) or right (1) branch
Root
/ \
0 1
/ \ / \
00 01 10 11
/\ /\ /\ /\
... ... ... ...
Node 1010... follows path: right → left → right → left → ...Subtree Coverage
For any node, the ID space divides into subtrees:
From Node A's perspective (ID: 0...):
Subtree 0: All nodes starting with 1... (half the network)
XOR distance: 1xxx... (very far)
Subtree 1: All nodes starting with 01... (quarter of network)
XOR distance: 01xx... (far)
Subtree 2: All nodes starting with 001... (eighth of network)
XOR distance: 001x... (medium)
...and so onKey insight: Each subtree is exponentially smaller, but A needs to know at least one node from each subtree to route to any destination.
k-Buckets: The Routing Table
What Are k-Buckets?
A k-bucket is a list of up to k contacts (nodes) for a specific distance range. Zentalk uses k=20.
Each node maintains 256 k-buckets (one for each bit position):
| Bucket | Distance Range | Contains Nodes Where |
|---|---|---|
| 0 | 2^0 to 2^1-1 | First 255 bits match, last bit differs |
| 1 | 2^1 to 2^2-1 | First 254 bits match, bit 254 differs |
| 2 | 2^2 to 2^3-1 | First 253 bits match, bit 253 differs |
| … | … | … |
| 255 | 2^255 to 2^256-1 | First bit differs |
Why k=20?
The parameter k balances multiple concerns:
| Factor | Small k | Large k |
|---|---|---|
| Memory usage | Lower | Higher |
| Lookup reliability | Lower | Higher |
| Network overhead | Lower | Higher |
| Resistance to churn | Lower | Higher |
Why 20 specifically:
- Probability of all k nodes failing simultaneously: extremely low
- Even with 50% node churn, lookup succeeds with high probability
- Memory overhead is acceptable (256 buckets * 20 nodes * ~50 bytes = ~256KB)
k-Bucket Maintenance
When a node learns about a new contact:
1. Calculate XOR distance to determine target bucket
2. If bucket has < k entries:
→ Add new contact
3. If bucket is full:
→ Ping least-recently-seen contact
→ If no response: replace with new contact
→ If responds: move to end of list, discard new contactWhy favor old contacts?
- Long-lived nodes are more likely to stay alive (empirically proven)
- Attackers can’t easily inject new malicious nodes
- “Least-recently-seen” eviction provides Sybil resistance
Routing Table Visualization
Node ID: 00101... (256 bits)
Bucket 255: [Nodes starting with 1...] ← Far (half the network)
Bucket 254: [Nodes starting with 01...] ← Far-ish
Bucket 253: [Nodes starting with 001...] ← Medium
...
Bucket 2: [Nodes starting with 00100...] ← Close
Bucket 1: [Nodes starting with 001010...]← Very close
Bucket 0: [Nodes starting with 0010100...]← Closest
Each bucket holds up to k=20 nodesLookup Algorithm
FIND_NODE Operation
The fundamental operation: find the k closest nodes to a target ID.
Algorithm:
FIND_NODE(target):
1. Initialize:
- shortlist = k closest nodes from local routing table
- queried = empty set
- closest_seen = infinity
2. Loop:
- Select alpha (3) nodes from shortlist not yet queried
- Send parallel FIND_NODE RPCs to selected nodes
- Mark nodes as queried
3. Process responses:
- Each response contains k closest nodes the responder knows
- Add new nodes to shortlist
- Update closest_seen if closer node found
4. Termination:
- If no node closer than closest_seen found in a round
- Query all remaining nodes in shortlist within k closest
- Return k closest nodes seenLookup Convergence: Why It Works
Key property: Each hop at least halves the remaining distance.
Start: Node A (ID: 0000...)
Target: 1111...
Initial distance: 1111... XOR 0000... = 1111... (very large)
Hop 1: A asks bucket 255 → gets node B (ID: 1000...)
New distance: 1111... XOR 1000... = 0111... (halved!)
Hop 2: B asks its bucket → gets node C (ID: 1100...)
New distance: 1111... XOR 1100... = 0011... (halved again!)
Hop 3: C asks its bucket → gets node D (ID: 1110...)
Distance: 0001...
Hop 4: D asks its bucket → gets node E (ID: 1111...)
Distance: 0000... (found!)Maximum hops: O(log n) where n = network size
For a million-node network: ~20 hops maximum.
FIND_VALUE Operation
Similar to FIND_NODE, but for retrieving stored values:
FIND_VALUE(key):
1. Compute target = hash(key)
2. Perform iterative lookup (like FIND_NODE)
3. At each hop:
- If node has value for key → return value immediately
- Otherwise → return k closest nodes (continue lookup)
4. If lookup completes without finding value:
- Store value at closest node found (caching)
- Return "not found"Iterative vs Recursive Lookup
| Approach | How It Works | Pros | Cons |
|---|---|---|---|
| Iterative | Querying node controls entire lookup | Predictable latency, easier debugging | More round trips |
| Recursive | Each node forwards to next | Fewer round trips | Unpredictable, harder to secure |
Zentalk uses iterative lookups because:
- Better control over timeout/retry
- Easier to reason about security
- Can implement lookup caching at originator
Alpha Parameter: Parallel Queries
The alpha parameter (typically 3) controls lookup parallelism:
Alpha = 1: Sequential lookups
A → B → C → D (slow, but minimal bandwidth)
Alpha = 3: Parallel lookups
A → B ─┐
A → C ─┼→ best response → continue
A → D ─┘
(faster, more bandwidth)Why alpha=3:
- Tolerates 2 slow/failed nodes per round
- 3x speedup over sequential in practice
- Reasonable bandwidth overhead
| Parameter | Zentalk Value | Purpose |
|---|---|---|
| k (bucket size) | 20 | Redundancy, reliability |
| alpha (parallelism) | 3 | Lookup speed |
| ID size | 256 bits | Security margin, collision avoidance |
Node Operations
JOIN: How a New Node Joins
When a node first starts, it knows nothing about the network except bootstrap nodes:
JOIN(bootstrap_nodes):
1. Generate node ID:
- node_id = SHA256(public_key)
- This binds ID to cryptographic identity
2. Contact bootstrap node:
- Send PING to verify connectivity
- Receive initial contacts
3. Self-lookup:
- Perform FIND_NODE(own_node_id)
- This populates routing table with nearby nodes
4. Bucket refresh:
- For each bucket, FIND_NODE(random_id_in_bucket)
- Discovers nodes at all distance ranges
5. Announce presence:
- Stored data will be replicated to us automatically
- Other lookups will discover usWhy self-lookup works:
- Other nodes learn about us when we query them
- We learn about nodes close to us (most important)
- Routing table fills organically
STORE: How Data Is Stored
STORE(key, value):
1. Compute storage location:
- target = hash(key) // 256-bit ID
2. Find storage nodes:
- nodes = FIND_NODE(target)
- Select k closest nodes to target
3. Store value:
- Send STORE RPC to selected nodes
- Each stores (key, value, timestamp, TTL)
4. Replication:
- Value stored on k nodes for redundancy
- Nodes periodically replicate to neighborsStorage responsibility:
- Nodes whose IDs are closest to the key are responsible
- On node join/leave, data migrates to maintain this property
Refresh: Keeping Routing Tables Fresh
Routing tables become stale as nodes join/leave:
REFRESH():
Every hour, for each bucket:
1. If no lookup touched this bucket recently:
- random_id = generate ID in bucket's range
- FIND_NODE(random_id)
2. Ping least-recently-seen nodes:
- If no response → remove from bucket
- If response → move to end of bucketWhy refresh matters:
- Detects failed nodes
- Discovers new nodes
- Keeps routing table accurate
Republishing Data
Stored data must be republished to survive node churn:
REPUBLISH():
Every hour:
1. For each locally stored (key, value):
- If originally_stored_by_us OR near_expiration:
- STORE(key, value) to refresh
2. Replicate to new close neighbors:
- If new node joined closer to key:
- Send them the data| Operation | Frequency | Purpose |
|---|---|---|
| Bucket refresh | 1 hour | Maintain routing accuracy |
| Data republish | 1 hour | Prevent data loss |
| Ping stale contacts | On new contact | Verify liveness |
Security Considerations
Eclipse Attacks
The Attack: An attacker surrounds a victim node with malicious nodes, controlling all its network view.
Normal:
Victim ──► Honest Node A
──► Honest Node B
──► Honest Node C
Eclipsed:
Victim ──► Attacker Node 1 ──► Attacker controls
──► Attacker Node 2 ──► all information
──► Attacker Node 3 ──► victim receivesImpact:
- Attacker controls what data victim sees
- Can partition victim from honest network
- Can censor specific keys
Mitigations in Zentalk:
| Mitigation | How It Works |
|---|---|
| Diverse bootstrap | Multiple independent bootstrap nodes |
| Connection limits | Max connections per IP/subnet |
| Path diversity | Multiple lookup paths for critical data |
| Bucket constraints | Limit nodes from same IP range per bucket |
| Long-term contacts | Prefer established contacts (harder to eclipse) |
Sybil Attacks
The Attack: Attacker creates many fake identities to gain disproportionate network influence.
Honest network: 1000 nodes
Attacker adds: 10000 Sybil nodes
Result: Attacker controls 90% of routing table entries
Attacker likely responsible for most keysWhy Sybil is hard in DHT:
- Can’t choose arbitrary node IDs (see below)
- k-bucket design limits exposure
- Honest nodes seen first are preferred
Mitigations:
| Mitigation | How It Works |
|---|---|
| Proof-of-work | Node ID generation requires computational cost |
| Cryptographic binding | node_id = hash(public_key), can’t choose ID |
| Rate limiting | Limit new contacts per time period |
| Reputation | Track node behavior over time |
| Resource testing | Verify nodes have real resources (bandwidth, storage) |
Node ID Generation
Critical security requirement: Nodes must not be able to choose their ID.
Why this matters:
If attackers could choose IDs:
- Position Sybil nodes near target key
- Become responsible for any key they want
- Intercept/modify stored dataZentalk’s approach:
node_id = SHA256(ed25519_public_key || creation_timestamp)| Property | Guarantee |
|---|---|
| Binding to key | ID tied to cryptographic identity |
| Unpredictable | SHA256 output is uniformly random |
| Verifiable | Anyone can verify ID was generated correctly |
| Non-reusable | Timestamp prevents pre-computation |
Verification on first contact:
1. Node presents: (node_id, public_key, timestamp, signature)
2. Verifier checks:
- node_id == SHA256(public_key || timestamp)
- timestamp within acceptable window
- signature validAdditional Security Measures
| Threat | Mitigation |
|---|---|
| Data poisoning | Store signatures with data, verify on retrieval |
| Routing table poisoning | Verify node IDs before adding to routing table |
| DDoS via lookups | Rate limit lookup requests |
| Replay attacks | Include timestamps, maintain seen-message cache |
Zentalk-Specific Usage
Relay Node Discovery
Zentalk uses DHT to discover relay nodes for 3-hop relay routing:
1. Client needs relay nodes for circuit
2. Query DHT for key: hash("relay:" || region || timestamp_bucket)
3. DHT returns list of available relays
4. Client selects relays for circuit buildingWhy DHT for relay discovery:
- No central relay directory
- Load balancing across regions
- Censorship resistant
- Relays can join/leave dynamically
| DHT Key Pattern | Returns |
|---|---|
relay:guard:eu | Guard relays in Europe |
relay:middle:* | All middle relays |
relay:exit:us | Exit relays in US |
Key Bundle Distribution
User public key bundles are stored in DHT:
Store key bundle:
key = hash("keybundle:" || user_wallet_address)
value = encrypted_and_signed_key_bundle
STORE(key, value)
Fetch key bundle:
key = hash("keybundle:" || recipient_wallet_address)
bundle = FIND_VALUE(key)
verify_signature(bundle)Benefits:
- No central key server
- Key bundles available even if API offline
- Redundant storage across k nodes
Mesh Storage Addressing
Decentralized storage uses DHT for content addressing:
Store encrypted media chunk:
content_hash = SHA256(encrypted_chunk)
storage_key = hash("media:" || content_hash)
STORE(storage_key, encrypted_chunk)
Retrieve:
storage_key = hash("media:" || known_content_hash)
chunk = FIND_VALUE(storage_key)
verify: SHA256(chunk) == content_hashMesh storage modules using DHT:
| Module | DHT Key Format | Purpose |
|---|---|---|
| messages | msg:{recipient}:{timestamp_bucket} | Store-and-forward messages |
| media | media:{content_hash} | Chunked encrypted media |
| keys | keybundle:{wallet_address} | Public key distribution |
| profile | profile:{wallet_address} | Encrypted profile sync |
DHT Parameters in Zentalk
| Parameter | Value | Rationale |
|---|---|---|
| k (bucket size) | 20 | Balance reliability vs overhead |
| alpha (parallel queries) | 3 | Fast lookups, fault tolerance |
| ID size | 256 bits | Match SHA256, security margin |
| Refresh interval | 1 hour | Keep routing tables fresh |
| Republish interval | 1 hour | Maintain data availability |
| Data TTL | 24-72 hours | Configurable per data type |
| Bootstrap nodes | 5+ | Geographic distribution |
Performance Characteristics
Lookup Complexity
| Metric | Value | Notes |
|---|---|---|
| Lookup hops | O(log n) | ~20 hops for 1M nodes |
| Messages per lookup | O(k * log n) | With alpha parallelism |
| Routing table size | O(k * log n) | ~5000 entries for 1M network |
| Storage overhead | O(k) per key | Replication factor |
Latency Expectations
Lookup latency = hops * avg_rtt + processing_time
For 1M node network:
- Hops: ~20
- Avg RTT: 50ms (varies by geography)
- Total: ~1 second typical
With alpha=3 parallelism:
- Effective latency: ~300-500ms typicalRelated Documentation
- Architecture - System overview including mesh storage
- Onion Routing - How DHT enables relay discovery
- Cryptography - Hash functions and signatures used
- Run a Node - Operating a Zentalk DHT node