Node Discovery
How nodes find and connect to each other in the Zentalk peer-to-peer network.
Overview
Node discovery is the process by which new nodes join the network and existing nodes maintain connections with peers. Zentalk uses a multi-layered discovery approach combining:
| Mechanism | Purpose | When Used |
|---|---|---|
| Bootstrap nodes | Initial network entry | First connection |
| DHT (Kademlia) | Deterministic peer finding | Ongoing discovery |
| Gossip protocol | Peer information exchange | Continuous |
| DNS seeds | Fallback discovery | Bootstrap failure |
Key goals:
- Enable any node to join without central coordination
- Maintain network connectivity despite churn
- Resist attacks that manipulate peer discovery
- Achieve geographic and topological diversity
Bootstrap Process
Hardcoded Bootstrap Nodes
Every Zentalk node ships with a list of hardcoded bootstrap nodes. These are well-known, stable nodes that serve as entry points to the network.
// Bootstrap node configuration
var BootstrapNodes = []string{
"boot1.zentalk.io:9000",
"boot2.zentalk.io:9000",
"boot3.zentalk.io:9000",
"boot-eu.zentalk.io:9000",
"boot-asia.zentalk.io:9000",
}Bootstrap node requirements:
| Requirement | Value | Rationale |
|---|---|---|
| Uptime | 99.9%+ | Must be reliably available |
| Bandwidth | 100 Mbps+ | Handle many simultaneous connections |
| Geographic distribution | 3+ continents | Reduce latency, resist regional censorship |
| Operator diversity | 3+ operators | No single point of control |
| Static IP/DNS | Required | Nodes must be able to find them |
Initial Connection Sequence
When a new node starts, it follows this sequence:
1. Load bootstrap node list from configuration
2. Shuffle list (randomize order)
3. For each bootstrap node (until success):
a. Establish TLS connection
b. Send HELLO message with own node ID
c. Receive WELCOME with initial peer list
d. Add peers to routing table
4. If all bootstraps fail:
a. Try DNS seed discovery
b. Try cached peers from previous session
5. Begin self-lookup to populate routing tableConnection handshake:
New Node Bootstrap Node
| |
|-------- TLS HELLO -------->|
| |
|<----- WELCOME + PEERS -----|
| |
|------- FIND_NODE(self) --->|
| |
|<---- K_CLOSEST_NODES ------|
| |Bootstrap Node Selection
Nodes select bootstrap nodes using weighted random selection:
type BootstrapNode struct {
Address string
Weight int // Higher = more likely to be selected
LastSuccess time.Time
FailCount int
}
func SelectBootstrap(nodes []BootstrapNode) *BootstrapNode {
// Weight factors:
// - Base weight from configuration
// - Reduced by recent failures
// - Increased by geographic proximity
// - Randomized to distribute load
}Peer Discovery Mechanisms
DHT-Based Discovery (Kademlia)
The primary peer discovery mechanism uses Kademlia DHT lookups.
How DHT discovery works:
1. New node generates ID: node_id = SHA256(public_key)
2. Perform self-lookup: FIND_NODE(node_id)
3. Each response returns k closest nodes
4. Recursively query closer nodes
5. Result: Routing table filled with nearby peersLookup parameters:
| Parameter | Value | Purpose |
|---|---|---|
| k (bucket size) | 20 | Nodes per distance bucket |
| alpha (parallelism) | 3 | Concurrent queries |
| Lookup timeout | 5 seconds | Per-query timeout |
| Max hops | 20 | Prevent infinite loops |
Bucket refresh for discovery:
func RefreshBuckets() {
for i := 0; i < 256; i++ {
bucket := routingTable.GetBucket(i)
if bucket.LastLookup.Add(1 * time.Hour).Before(time.Now()) {
// Generate random ID in bucket's range
randomID := GenerateIDInBucket(i)
// Lookup discovers new nodes in this distance range
FindNode(randomID)
}
}
}Gossip-Based Peer Exchange
Nodes continuously exchange peer information with connected neighbors.
Peer exchange protocol:
Node A Node B
| |
|---- PEER_EXCHANGE_REQ ------>|
| (my known peers) |
| |
|<--- PEER_EXCHANGE_RESP ------|
| (peers you might want) |
| |Exchange message structure:
type PeerExchangeMessage struct {
Peers []PeerInfo `json:"peers"`
TTL int `json:"ttl"` // Prevent infinite propagation
}
type PeerInfo struct {
NodeID [32]byte `json:"node_id"`
Address string `json:"address"`
PublicKey []byte `json:"public_key"`
Capabilities uint64 `json:"capabilities"`
LastSeen time.Time `json:"last_seen"`
Score float64 `json:"score"`
}Gossip frequency:
| Condition | Exchange Interval |
|---|---|
| Normal operation | Every 30 seconds |
| Low peer count (<10) | Every 10 seconds |
| Network partition detected | Every 5 seconds |
| Stable, well-connected | Every 60 seconds |
DNS Seed Discovery
Fallback mechanism when bootstrap nodes are unreachable.
DNS seed records:
seeds.zentalk.io TXT "node1.zentalk.io:9000"
seeds.zentalk.io TXT "node2.zentalk.io:9000"
seeds.zentalk.io TXT "192.0.2.1:9000"DNS resolution process:
func DNSDiscovery() ([]string, error) {
records, err := net.LookupTXT("seeds.zentalk.io")
if err != nil {
return nil, err
}
var peers []string
for _, record := range records {
// Validate format: host:port
if isValidPeerAddress(record) {
peers = append(peers, record)
}
}
return peers, nil
}Advantages of DNS seeds:
- Works even if hardcoded IPs change
- Can be updated without client software changes
- Distributed via global DNS infrastructure
- Harder to block than specific IPs
Peer Scoring and Reputation
Scoring Algorithm
Each node maintains scores for connected peers to prioritize reliable connections.
type PeerScore struct {
NodeID [32]byte
BaseScore float64 // Starting score: 50.0
UptimeScore float64 // 0-25 points
LatencyScore float64 // 0-20 points
BandwidthScore float64 // 0-15 points
BehaviorScore float64 // -50 to +40 points
LastUpdated time.Time
}
func (s *PeerScore) Total() float64 {
total := s.BaseScore + s.UptimeScore + s.LatencyScore +
s.BandwidthScore + s.BehaviorScore
return math.Max(0, math.Min(100, total))
}Scoring Factors
| Factor | Weight | Measurement |
|---|---|---|
| Uptime | 0-25 | Continuous connection duration |
| Latency | 0-20 | Average response time |
| Bandwidth | 0-15 | Throughput capacity |
| Behavior | -50 to +40 | Protocol compliance, helpfulness |
Uptime scoring:
func CalculateUptimeScore(connectionDuration time.Duration) float64 {
hours := connectionDuration.Hours()
switch {
case hours < 1:
return 0
case hours < 24:
return 5
case hours < 168: // 1 week
return 10
case hours < 720: // 1 month
return 20
default:
return 25
}
}Latency scoring:
func CalculateLatencyScore(avgLatency time.Duration) float64 {
ms := avgLatency.Milliseconds()
switch {
case ms < 50:
return 20
case ms < 100:
return 15
case ms < 200:
return 10
case ms < 500:
return 5
default:
return 0
}
}Behavior scoring:
| Behavior | Score Impact |
|---|---|
| Valid responses | +0.1 per response |
| Forwarded messages successfully | +0.5 per message |
| Failed to respond | -1 per timeout |
| Invalid message format | -5 per violation |
| Suspected spam/flood | -10 per incident |
| Eclipse attack behavior | -50 (immediate) |
Score Decay
Scores decay over time to ensure fresh evaluation:
func ApplyScoreDecay(score *PeerScore) {
timeSinceUpdate := time.Since(score.LastUpdated)
// Decay 1% per hour of inactivity
decayFactor := math.Pow(0.99, timeSinceUpdate.Hours())
// Apply decay to volatile components
score.BehaviorScore *= decayFactor
score.LatencyScore *= decayFactor
// Uptime score doesn't decay (measured directly)
score.LastUpdated = time.Now()
}Banning and Blocking
Peers exhibiting malicious behavior are banned:
type BanPolicy struct {
ScoreThreshold float64 // Below this = banned
BanDuration time.Duration // How long the ban lasts
MaxViolations int // Violations before permanent ban
}
var DefaultBanPolicy = BanPolicy{
ScoreThreshold: 10.0,
BanDuration: 24 * time.Hour,
MaxViolations: 3,
}
func CheckBan(peer *PeerScore) BanDecision {
if peer.Total() < DefaultBanPolicy.ScoreThreshold {
return BanDecision{
Banned: true,
Duration: DefaultBanPolicy.BanDuration,
Reason: "Score below threshold",
}
}
return BanDecision{Banned: false}
}Ban escalation:
| Violation Count | Ban Duration |
|---|---|
| 1 | 1 hour |
| 2 | 24 hours |
| 3+ | 7 days |
| Severe (attack) | Permanent |
Network Topology Formation
Target Peer Count
Nodes maintain a target number of connections for optimal operation:
| Connection Type | Min | Target | Max |
|---|---|---|---|
| Total peers | 8 | 25 | 50 |
| Outbound | 8 | 15 | 25 |
| Inbound | 0 | 10 | 25 |
| Bootstrap | 1 | 2 | 3 |
Connection management:
type ConnectionManager struct {
MinPeers int
TargetPeers int
MaxPeers int
outbound map[NodeID]*Connection
inbound map[NodeID]*Connection
}
func (cm *ConnectionManager) NeedsMorePeers() bool {
return len(cm.outbound) + len(cm.inbound) < cm.TargetPeers
}
func (cm *ConnectionManager) CanAcceptInbound() bool {
return len(cm.inbound) < cm.MaxPeers - cm.MinPeers
}Peer Selection Strategy
When selecting new peers, nodes balance multiple factors:
type PeerSelector struct {
scoreWeight float64 // 0.4 - prefer high-scoring peers
diversityWeight float64 // 0.3 - prefer diverse network positions
latencyWeight float64 // 0.2 - prefer low-latency peers
randomWeight float64 // 0.1 - some randomness
}
func (ps *PeerSelector) SelectPeers(candidates []PeerInfo, count int) []PeerInfo {
// Score each candidate
scored := make([]ScoredCandidate, len(candidates))
for i, c := range candidates {
scored[i] = ScoredCandidate{
Peer: c,
Score: ps.calculateScore(c),
}
}
// Sort by score and select top candidates
sort.Slice(scored, func(i, j int) bool {
return scored[i].Score > scored[j].Score
})
return scored[:count]
}Geographic Diversity
Nodes actively seek connections to geographically diverse peers:
type GeoDistribution struct {
Regions map[string]int // Region -> connection count
Target map[string]int // Region -> target count
}
func (gd *GeoDistribution) NeedsRegion(region string) bool {
current := gd.Regions[region]
target := gd.Target[region]
return current < target
}
// Target distribution example
var TargetGeoDistribution = map[string]int{
"europe": 5,
"north-america": 5,
"asia": 5,
"south-america": 3,
"oceania": 2,
"africa": 2,
}Why geographic diversity matters:
- Reduces latency for global message delivery
- Resists regional network partitions
- Prevents geographic censorship
- Improves network resilience
Node Capabilities Advertisement
Capability Flags
Nodes advertise their capabilities using a bitfield:
type Capabilities uint64
const (
CapRelay Capabilities = 1 << 0 // Can relay messages
CapStorage Capabilities = 1 << 1 // Offers mesh storage
CapBootstrap Capabilities = 1 << 2 // Can serve as bootstrap
CapGuardRelay Capabilities = 1 << 3 // Guard relay for 3-hop relay routing
CapMiddleRelay Capabilities = 1 << 4 // Middle relay
CapExitRelay Capabilities = 1 << 5 // Exit relay
CapHighBandwidth Capabilities = 1 << 6 // High bandwidth available
CapIPv6 Capabilities = 1 << 7 // IPv6 support
CapWebSocket Capabilities = 1 << 8 // WebSocket support
CapValidator Capabilities = 1 << 9 // Network validator
)Capability advertisement message:
type NodeAnnouncement struct {
NodeID [32]byte `json:"node_id"`
PublicKey []byte `json:"public_key"`
Address string `json:"address"`
Capabilities Capabilities `json:"capabilities"`
Version string `json:"version"`
Timestamp int64 `json:"timestamp"`
Signature []byte `json:"signature"`
}Service Discovery
Nodes can query for peers with specific capabilities:
// Find nodes with specific capabilities
func FindCapableNodes(required Capabilities) []PeerInfo {
var results []PeerInfo
for _, peer := range routingTable.AllPeers() {
if peer.Capabilities & required == required {
results = append(results, peer)
}
}
return results
}
// Example: Find guard relays
guardRelays := FindCapableNodes(CapGuardRelay | CapRelay)DHT-based service registry:
// Announce service availability
func AnnounceService(serviceType string, nodeInfo NodeAnnouncement) {
key := SHA256("service:" + serviceType)
Store(key, nodeInfo)
}
// Discover service providers
func DiscoverService(serviceType string) []NodeAnnouncement {
key := SHA256("service:" + serviceType)
return FindValue(key)
}Joining the Network
Complete Bootstrap Sequence
Detailed step-by-step process for a new node joining:
PHASE 1: Key Generation (Local)
================================
1. Generate Ed25519 keypair
2. Compute node_id = SHA256(public_key || timestamp)
3. Initialize empty routing table
PHASE 2: Bootstrap Connection
==============================
4. Load bootstrap node list
5. Select bootstrap node (weighted random)
6. Establish TLS connection to bootstrap
7. Send HELLO with node_id and public_key
8. Receive WELCOME with k initial peers
9. Add bootstrap to routing table
PHASE 3: Self-Lookup
=====================
10. Query bootstrap: FIND_NODE(own_node_id)
11. Receive k closest nodes to self
12. Query those nodes: FIND_NODE(own_node_id)
13. Continue until no closer nodes found
14. Result: Know all nodes closest to self
PHASE 4: Routing Table Population
==================================
15. For each bucket i (0 to 255):
a. Generate random ID in bucket's range
b. Perform FIND_NODE(random_id)
c. Add discovered nodes to routing table
16. Result: Know nodes at all distance ranges
PHASE 5: Peer Connection
=========================
17. Select outbound peers from routing table
18. Establish persistent connections
19. Begin peer exchange protocol
20. Start message relay operations
PHASE 6: Network Participation
===============================
21. Respond to incoming FIND_NODE queries
22. Participate in gossip protocol
23. Announce capabilities
24. Begin normal operationRouting Table Population
func PopulateRoutingTable(bootstrap *Connection) error {
// Phase 1: Self-lookup
selfLookup := NewLookup(ownNodeID)
selfLookup.Query(bootstrap)
selfLookup.IterateUntilConverged()
// Phase 2: Bucket refresh
for i := 0; i < 256; i++ {
if routingTable.Bucket(i).IsEmpty() {
randomID := GenerateIDInBucket(i)
lookup := NewLookup(randomID)
lookup.IterateUntilConverged()
}
}
return nil
}Connection Establishment
type ConnectionState int
const (
StateDisconnected ConnectionState = iota
StateConnecting
StateHandshaking
StateConnected
)
func EstablishConnection(peer PeerInfo) (*Connection, error) {
conn := &Connection{
PeerID: peer.NodeID,
State: StateConnecting,
}
// 1. TCP connection
tcpConn, err := net.DialTimeout("tcp", peer.Address, 10*time.Second)
if err != nil {
return nil, err
}
// 2. TLS handshake
conn.State = StateHandshaking
tlsConn := tls.Client(tcpConn, tlsConfig)
if err := tlsConn.Handshake(); err != nil {
return nil, err
}
// 3. Protocol handshake
if err := protocolHandshake(tlsConn, peer); err != nil {
return nil, err
}
conn.State = StateConnected
return conn, nil
}Network Health Metrics
Connection Monitoring
Nodes continuously monitor connection health:
type ConnectionHealth struct {
PeerID [32]byte
Connected bool
Latency time.Duration
LastPing time.Time
LastPong time.Time
BytesSent uint64
BytesReceived uint64
MessagesRelayed uint64
ErrorCount int
}
func MonitorConnection(conn *Connection) {
ticker := time.NewTicker(30 * time.Second)
defer ticker.Stop()
for range ticker.C {
// Send ping
conn.Health.LastPing = time.Now()
conn.SendPing()
// Wait for pong (with timeout)
select {
case <-conn.PongReceived:
conn.Health.LastPong = time.Now()
conn.Health.Latency = conn.Health.LastPong.Sub(conn.Health.LastPing)
case <-time.After(5 * time.Second):
conn.Health.ErrorCount++
if conn.Health.ErrorCount > 3 {
conn.Close()
}
}
}
}Health metrics tracked:
| Metric | Threshold | Action |
|---|---|---|
| Ping latency | > 2 seconds | Mark degraded |
| Failed pings | > 3 consecutive | Disconnect |
| Error rate | > 10% | Reduce score |
| Bandwidth | < 1 KB/s sustained | Mark slow |
Peer Churn Handling
The network handles nodes joining and leaving:
func HandlePeerDisconnect(peer *Connection) {
// 1. Remove from active connections
connectionManager.Remove(peer.NodeID)
// 2. Keep in routing table (may reconnect)
routingTable.MarkInactive(peer.NodeID)
// 3. Find replacement if below minimum
if connectionManager.NeedsMorePeers() {
candidates := routingTable.GetCandidates(10)
for _, candidate := range candidates {
if connectionManager.Connect(candidate) {
break
}
}
}
// 4. Trigger peer discovery if critically low
if connectionManager.PeerCount() < MinPeers {
triggerEmergencyDiscovery()
}
}
func triggerEmergencyDiscovery() {
// Accelerated discovery process
// - Query all k-buckets
// - Try cached peers
// - Reconnect to bootstrap if needed
}Churn resilience mechanisms:
| Mechanism | Purpose |
|---|---|
| k-bucket redundancy | Multiple peers per distance range |
| Connection pooling | Reserve connections for stability |
| Cached peer list | Remember previously-seen peers |
| Periodic refresh | Keep routing table current |
| Lazy eviction | Don’t immediately remove disconnected peers |
Security Considerations
Eclipse Attack Prevention
An eclipse attack isolates a node by surrounding it with attacker-controlled peers.
Mitigations implemented:
| Mitigation | How It Works |
|---|---|
| IP diversity | Max 2 peers per /24 subnet |
| Outbound preference | Prioritize self-initiated connections |
| Anchor connections | Maintain long-term “anchor” peers |
| Fresh peer injection | Periodically add random new peers |
| Bootstrap rotation | Randomly reconnect to bootstraps |
type EclipseProtection struct {
MaxPeersPerSubnet int // 2
MinOutboundRatio float64 // 0.6 (60% outbound)
AnchorPeerCount int // 3
AnchorRotationPeriod time.Duration // 24 hours
}
func (ep *EclipseProtection) ValidateNewPeer(peer PeerInfo) bool {
subnet := extractSubnet(peer.Address, 24)
// Check subnet diversity
if subnetCount[subnet] >= ep.MaxPeersPerSubnet {
return false
}
// Verify peer isn't claiming suspicious node ID
expectedID := SHA256(peer.PublicKey)
if !bytes.Equal(expectedID, peer.NodeID[:]) {
return false
}
return true
}Sybil Attack Mitigation
Sybil attacks create many fake identities to gain disproportionate influence.
Mitigations:
| Defense | Implementation |
|---|---|
| Cryptographic node IDs | node_id = SHA256(public_key) |
| Proof of work (optional) | Computational cost for ID generation |
| Rate limiting | Max new peers per time window |
| Reputation over time | Long-term nodes trusted more |
| Resource verification | Verify claimed bandwidth/storage |
// Node ID generation with proof of work
func GenerateNodeID(publicKey []byte, difficulty int) (NodeID, Nonce) {
for nonce := uint64(0); ; nonce++ {
data := append(publicKey, uint64ToBytes(nonce)...)
hash := SHA256(data)
// Check if hash meets difficulty (leading zeros)
if hasLeadingZeros(hash, difficulty) {
return hash, nonce
}
}
}
// Verification
func VerifyNodeID(publicKey []byte, nodeID NodeID, nonce uint64, difficulty int) bool {
data := append(publicKey, uint64ToBytes(nonce)...)
hash := SHA256(data)
return bytes.Equal(hash, nodeID) && hasLeadingZeros(hash, difficulty)
}Additional Security Measures
| Threat | Countermeasure |
|---|---|
| Man-in-the-middle | TLS with certificate pinning |
| Routing table poisoning | Verify node IDs cryptographically |
| Bootstrap poisoning | Multiple independent bootstraps |
| DNS poisoning | DNSSEC, fallback to hardcoded |
| Traffic analysis | Peer connection padding |
| Node impersonation | Signed announcements |
// Signed node announcement
func CreateAnnouncement(privateKey ed25519.PrivateKey, info NodeInfo) *SignedAnnouncement {
data := serializeNodeInfo(info)
signature := ed25519.Sign(privateKey, data)
return &SignedAnnouncement{
Info: info,
Signature: signature,
Timestamp: time.Now().Unix(),
}
}
func VerifyAnnouncement(announcement *SignedAnnouncement) bool {
// Check signature
data := serializeNodeInfo(announcement.Info)
if !ed25519.Verify(announcement.Info.PublicKey, data, announcement.Signature) {
return false
}
// Check timestamp freshness (prevent replay)
age := time.Since(time.Unix(announcement.Timestamp, 0))
if age > 24*time.Hour {
return false
}
return true
}Configuration Reference
Node Discovery Settings
# discovery.yaml
discovery:
# Bootstrap configuration
bootstrap:
nodes:
- "boot1.zentalk.io:9000"
- "boot2.zentalk.io:9000"
- "boot3.zentalk.io:9000"
retry_interval: "30s"
max_retries: 10
# DHT settings
dht:
bucket_size: 20
alpha: 3
refresh_interval: "1h"
lookup_timeout: "5s"
# Peer management
peers:
min_connections: 8
target_connections: 25
max_connections: 50
max_per_subnet: 2
# Gossip protocol
gossip:
interval: "30s"
max_peers_per_exchange: 20
ttl: 3
# Security
security:
require_proof_of_work: false
pow_difficulty: 16
max_new_peers_per_hour: 50
ban_threshold: 10.0Related Documentation
- DHT and Kademlia - Detailed DHT protocol specification
- Run a Node - Operating a Zentalk node
- Architecture - System overview
- Threat Model - Security analysis
- Onion Routing - Relay node discovery