Node Discovery

How nodes find and connect to each other in the Zentalk peer-to-peer network.

New Node

1. Connect

Bootstrap A

Bootstrap B

2. DHT Lookup

Peer

3. Gossip

Node

Bootstrap → DHT Lookup → Gossip Exchange • No central directory

Overview

Node discovery is the process by which new nodes join the network and existing nodes maintain connections with peers. Zentalk uses a multi-layered discovery approach combining:

Mechanism	Purpose	When Used
Bootstrap nodes	Initial network entry	First connection
DHT (Kademlia)	Deterministic peer finding	Ongoing discovery
Gossip protocol	Peer information exchange	Continuous
DNS seeds	Fallback discovery	Bootstrap failure

Key goals:

Enable any node to join without central coordination
Maintain network connectivity despite churn
Resist attacks that manipulate peer discovery
Achieve geographic and topological diversity

Bootstrap Process

Hardcoded Bootstrap Nodes

Every Zentalk node ships with a list of hardcoded bootstrap nodes. These are well-known, stable nodes that serve as entry points to the network.


// Bootstrap node configuration
var BootstrapNodes = []string{
    "boot1.zentalk.io:9000",
    "boot2.zentalk.io:9000",
    "boot3.zentalk.io:9000",
    "boot-eu.zentalk.io:9000",
    "boot-asia.zentalk.io:9000",
}

Bootstrap node requirements:

Requirement	Value	Rationale
Uptime	99.9%+	Must be reliably available
Bandwidth	100 Mbps+	Handle many simultaneous connections
Geographic distribution	3+ continents	Reduce latency, resist regional censorship
Operator diversity	3+ operators	No single point of control
Static IP/DNS	Required	Nodes must be able to find them

Initial Connection Sequence

When a new node starts, it follows this sequence:


1. Load bootstrap node list from configuration
2. Shuffle list (randomize order)
3. For each bootstrap node (until success):
   a. Establish TLS connection
   b. Send HELLO message with own node ID
   c. Receive WELCOME with initial peer list
   d. Add peers to routing table
4. If all bootstraps fail:
   a. Try DNS seed discovery
   b. Try cached peers from previous session
5. Begin self-lookup to populate routing table

Connection handshake:


New Node                    Bootstrap Node
    |                            |
    |-------- TLS HELLO -------->|
    |                            |
    |<----- WELCOME + PEERS -----|
    |                            |
    |------- FIND_NODE(self) --->|
    |                            |
    |<---- K_CLOSEST_NODES ------|
    |                            |

Bootstrap Node Selection

Nodes select bootstrap nodes using weighted random selection:


type BootstrapNode struct {
    Address      string
    Weight       int     // Higher = more likely to be selected
    LastSuccess  time.Time
    FailCount    int
}
 
func SelectBootstrap(nodes []BootstrapNode) *BootstrapNode {
    // Weight factors:
    // - Base weight from configuration
    // - Reduced by recent failures
    // - Increased by geographic proximity
    // - Randomized to distribute load
}

Peer Discovery Mechanisms

DHT-Based Discovery (Kademlia)

Kademlia DHT

Bootstrap 1

Bootstrap 2

Peer

New

Lookup

Query

XOR Distance

k-Closest

Target

XOR metric: d(a,b) = a ⊕ b • k-buckets organize peers by distance • O(log n) lookup

The primary peer discovery mechanism uses Kademlia DHT lookups.

How DHT discovery works:


1. New node generates ID: node_id = SHA256(public_key)
2. Perform self-lookup: FIND_NODE(node_id)
3. Each response returns k closest nodes
4. Recursively query closer nodes
5. Result: Routing table filled with nearby peers

Lookup parameters:

Parameter	Value	Purpose
k (bucket size)	20	Nodes per distance bucket
alpha (parallelism)	3	Concurrent queries
Lookup timeout	5 seconds	Per-query timeout
Max hops	20	Prevent infinite loops

Bucket refresh for discovery:


func RefreshBuckets() {
    for i := 0; i < 256; i++ {
        bucket := routingTable.GetBucket(i)
        if bucket.LastLookup.Add(1 * time.Hour).Before(time.Now()) {
            // Generate random ID in bucket's range
            randomID := GenerateIDInBucket(i)
            // Lookup discovers new nodes in this distance range
            FindNode(randomID)
        }
    }
}

Gossip-Based Peer Exchange

Nodes continuously exchange peer information with connected neighbors.

Peer exchange protocol:


Node A                         Node B
   |                              |
   |---- PEER_EXCHANGE_REQ ------>|
   |     (my known peers)         |
   |                              |
   |<--- PEER_EXCHANGE_RESP ------|
   |     (peers you might want)   |
   |                              |

Exchange message structure:


type PeerExchangeMessage struct {
    Peers []PeerInfo `json:"peers"`
    TTL   int        `json:"ttl"`      // Prevent infinite propagation
}
 
type PeerInfo struct {
    NodeID       [32]byte  `json:"node_id"`
    Address      string    `json:"address"`
    PublicKey    []byte    `json:"public_key"`
    Capabilities uint64    `json:"capabilities"`
    LastSeen     time.Time `json:"last_seen"`
    Score        float64   `json:"score"`
}

Gossip frequency:

Condition	Exchange Interval
Normal operation	Every 30 seconds
Low peer count (<10)	Every 10 seconds
Network partition detected	Every 5 seconds
Stable, well-connected	Every 60 seconds

DNS Seed Discovery

Fallback mechanism when bootstrap nodes are unreachable.

DNS seed records:


seeds.zentalk.io  TXT  "node1.zentalk.io:9000"
seeds.zentalk.io  TXT  "node2.zentalk.io:9000"
seeds.zentalk.io  TXT  "192.0.2.1:9000"

DNS resolution process:


func DNSDiscovery() ([]string, error) {
    records, err := net.LookupTXT("seeds.zentalk.io")
    if err != nil {
        return nil, err
    }
 
    var peers []string
    for _, record := range records {
        // Validate format: host:port
        if isValidPeerAddress(record) {
            peers = append(peers, record)
        }
    }
    return peers, nil
}

Advantages of DNS seeds:

Works even if hardcoded IPs change
Can be updated without client software changes
Distributed via global DNS infrastructure
Harder to block than specific IPs

Peer Scoring and Reputation

Scoring Algorithm

Each node maintains scores for connected peers to prioritize reliable connections.


type PeerScore struct {
    NodeID          [32]byte
    BaseScore       float64   // Starting score: 50.0
    UptimeScore     float64   // 0-25 points
    LatencyScore    float64   // 0-20 points
    BandwidthScore  float64   // 0-15 points
    BehaviorScore   float64   // -50 to +40 points
    LastUpdated     time.Time
}
 
func (s *PeerScore) Total() float64 {
    total := s.BaseScore + s.UptimeScore + s.LatencyScore +
             s.BandwidthScore + s.BehaviorScore
    return math.Max(0, math.Min(100, total))
}

Scoring Factors

Factor	Weight	Measurement
Uptime	0-25	Continuous connection duration
Latency	0-20	Average response time
Bandwidth	0-15	Throughput capacity
Behavior	-50 to +40	Protocol compliance, helpfulness

Uptime scoring:


func CalculateUptimeScore(connectionDuration time.Duration) float64 {
    hours := connectionDuration.Hours()
    switch {
    case hours < 1:
        return 0
    case hours < 24:
        return 5
    case hours < 168: // 1 week
        return 10
    case hours < 720: // 1 month
        return 20
    default:
        return 25
    }
}

Latency scoring:


func CalculateLatencyScore(avgLatency time.Duration) float64 {
    ms := avgLatency.Milliseconds()
    switch {
    case ms < 50:
        return 20
    case ms < 100:
        return 15
    case ms < 200:
        return 10
    case ms < 500:
        return 5
    default:
        return 0
    }
}

Behavior scoring:

Behavior	Score Impact
Valid responses	+0.1 per response
Forwarded messages successfully	+0.5 per message
Failed to respond	-1 per timeout
Invalid message format	-5 per violation
Suspected spam/flood	-10 per incident
Eclipse attack behavior	-50 (immediate)

Score Decay

Scores decay over time to ensure fresh evaluation:


func ApplyScoreDecay(score *PeerScore) {
    timeSinceUpdate := time.Since(score.LastUpdated)
 
    // Decay 1% per hour of inactivity
    decayFactor := math.Pow(0.99, timeSinceUpdate.Hours())
 
    // Apply decay to volatile components
    score.BehaviorScore *= decayFactor
    score.LatencyScore *= decayFactor
 
    // Uptime score doesn't decay (measured directly)
 
    score.LastUpdated = time.Now()
}

Banning and Blocking

Peers exhibiting malicious behavior are banned:


type BanPolicy struct {
    ScoreThreshold  float64       // Below this = banned
    BanDuration     time.Duration // How long the ban lasts
    MaxViolations   int           // Violations before permanent ban
}
 
var DefaultBanPolicy = BanPolicy{
    ScoreThreshold: 10.0,
    BanDuration:    24 * time.Hour,
    MaxViolations:  3,
}
 
func CheckBan(peer *PeerScore) BanDecision {
    if peer.Total() < DefaultBanPolicy.ScoreThreshold {
        return BanDecision{
            Banned:   true,
            Duration: DefaultBanPolicy.BanDuration,
            Reason:   "Score below threshold",
        }
    }
    return BanDecision{Banned: false}
}

Ban escalation:

Violation Count	Ban Duration
1	1 hour
2	24 hours
3+	7 days
Severe (attack)	Permanent

Network Topology Formation

Target Peer Count

Nodes maintain a target number of connections for optimal operation:

Connection Type	Min	Target	Max
Total peers	8	25	50
Outbound	8	15	25
Inbound	0	10	25
Bootstrap	1	2	3

Connection management:


type ConnectionManager struct {
    MinPeers    int
    TargetPeers int
    MaxPeers    int
 
    outbound    map[NodeID]*Connection
    inbound     map[NodeID]*Connection
}
 
func (cm *ConnectionManager) NeedsMorePeers() bool {
    return len(cm.outbound) + len(cm.inbound) < cm.TargetPeers
}
 
func (cm *ConnectionManager) CanAcceptInbound() bool {
    return len(cm.inbound) < cm.MaxPeers - cm.MinPeers
}

Peer Selection Strategy

When selecting new peers, nodes balance multiple factors:


type PeerSelector struct {
    scoreWeight      float64 // 0.4 - prefer high-scoring peers
    diversityWeight  float64 // 0.3 - prefer diverse network positions
    latencyWeight    float64 // 0.2 - prefer low-latency peers
    randomWeight     float64 // 0.1 - some randomness
}
 
func (ps *PeerSelector) SelectPeers(candidates []PeerInfo, count int) []PeerInfo {
    // Score each candidate
    scored := make([]ScoredCandidate, len(candidates))
    for i, c := range candidates {
        scored[i] = ScoredCandidate{
            Peer:  c,
            Score: ps.calculateScore(c),
        }
    }
 
    // Sort by score and select top candidates
    sort.Slice(scored, func(i, j int) bool {
        return scored[i].Score > scored[j].Score
    })
 
    return scored[:count]
}

Geographic Diversity

Nodes actively seek connections to geographically diverse peers:


type GeoDistribution struct {
    Regions map[string]int // Region -> connection count
    Target  map[string]int // Region -> target count
}
 
func (gd *GeoDistribution) NeedsRegion(region string) bool {
    current := gd.Regions[region]
    target := gd.Target[region]
    return current < target
}
 
// Target distribution example
var TargetGeoDistribution = map[string]int{
    "europe":        5,
    "north-america": 5,
    "asia":          5,
    "south-america": 3,
    "oceania":       2,
    "africa":        2,
}

Why geographic diversity matters:

Reduces latency for global message delivery
Resists regional network partitions
Prevents geographic censorship
Improves network resilience

Node Capabilities Advertisement

Capability Flags

Nodes advertise their capabilities using a bitfield:


type Capabilities uint64
 
const (
    CapRelay        Capabilities = 1 << 0  // Can relay messages
    CapStorage      Capabilities = 1 << 1  // Offers mesh storage
    CapBootstrap    Capabilities = 1 << 2  // Can serve as bootstrap
    CapGuardRelay   Capabilities = 1 << 3  // Guard relay for 3-hop relay routing
    CapMiddleRelay  Capabilities = 1 << 4  // Middle relay
    CapExitRelay    Capabilities = 1 << 5  // Exit relay
    CapHighBandwidth Capabilities = 1 << 6 // High bandwidth available
    CapIPv6         Capabilities = 1 << 7  // IPv6 support
    CapWebSocket    Capabilities = 1 << 8  // WebSocket support
    CapValidator    Capabilities = 1 << 9  // Network validator
)

Capability advertisement message:


type NodeAnnouncement struct {
    NodeID       [32]byte     `json:"node_id"`
    PublicKey    []byte       `json:"public_key"`
    Address      string       `json:"address"`
    Capabilities Capabilities `json:"capabilities"`
    Version      string       `json:"version"`
    Timestamp    int64        `json:"timestamp"`
    Signature    []byte       `json:"signature"`
}

Service Discovery

Nodes can query for peers with specific capabilities:


// Find nodes with specific capabilities
func FindCapableNodes(required Capabilities) []PeerInfo {
    var results []PeerInfo
 
    for _, peer := range routingTable.AllPeers() {
        if peer.Capabilities & required == required {
            results = append(results, peer)
        }
    }
 
    return results
}
 
// Example: Find guard relays
guardRelays := FindCapableNodes(CapGuardRelay | CapRelay)

DHT-based service registry:


// Announce service availability
func AnnounceService(serviceType string, nodeInfo NodeAnnouncement) {
    key := SHA256("service:" + serviceType)
    Store(key, nodeInfo)
}
 
// Discover service providers
func DiscoverService(serviceType string) []NodeAnnouncement {
    key := SHA256("service:" + serviceType)
    return FindValue(key)
}

Joining the Network

Complete Bootstrap Sequence

Detailed step-by-step process for a new node joining:


PHASE 1: Key Generation (Local)
================================
1. Generate Ed25519 keypair
2. Compute node_id = SHA256(public_key || timestamp)
3. Initialize empty routing table

PHASE 2: Bootstrap Connection
==============================
4. Load bootstrap node list
5. Select bootstrap node (weighted random)
6. Establish TLS connection to bootstrap
7. Send HELLO with node_id and public_key
8. Receive WELCOME with k initial peers
9. Add bootstrap to routing table

PHASE 3: Self-Lookup
=====================
10. Query bootstrap: FIND_NODE(own_node_id)
11. Receive k closest nodes to self
12. Query those nodes: FIND_NODE(own_node_id)
13. Continue until no closer nodes found
14. Result: Know all nodes closest to self

PHASE 4: Routing Table Population
==================================
15. For each bucket i (0 to 255):
    a. Generate random ID in bucket's range
    b. Perform FIND_NODE(random_id)
    c. Add discovered nodes to routing table
16. Result: Know nodes at all distance ranges

PHASE 5: Peer Connection
=========================
17. Select outbound peers from routing table
18. Establish persistent connections
19. Begin peer exchange protocol
20. Start message relay operations

PHASE 6: Network Participation
===============================
21. Respond to incoming FIND_NODE queries
22. Participate in gossip protocol
23. Announce capabilities
24. Begin normal operation

Routing Table Population


func PopulateRoutingTable(bootstrap *Connection) error {
    // Phase 1: Self-lookup
    selfLookup := NewLookup(ownNodeID)
    selfLookup.Query(bootstrap)
    selfLookup.IterateUntilConverged()
 
    // Phase 2: Bucket refresh
    for i := 0; i < 256; i++ {
        if routingTable.Bucket(i).IsEmpty() {
            randomID := GenerateIDInBucket(i)
            lookup := NewLookup(randomID)
            lookup.IterateUntilConverged()
        }
    }
 
    return nil
}

Connection Establishment


type ConnectionState int
 
const (
    StateDisconnected ConnectionState = iota
    StateConnecting
    StateHandshaking
    StateConnected
)
 
func EstablishConnection(peer PeerInfo) (*Connection, error) {
    conn := &Connection{
        PeerID: peer.NodeID,
        State:  StateConnecting,
    }
 
    // 1. TCP connection
    tcpConn, err := net.DialTimeout("tcp", peer.Address, 10*time.Second)
    if err != nil {
        return nil, err
    }
 
    // 2. TLS handshake
    conn.State = StateHandshaking
    tlsConn := tls.Client(tcpConn, tlsConfig)
    if err := tlsConn.Handshake(); err != nil {
        return nil, err
    }
 
    // 3. Protocol handshake
    if err := protocolHandshake(tlsConn, peer); err != nil {
        return nil, err
    }
 
    conn.State = StateConnected
    return conn, nil
}

Network Health Metrics

Connection Monitoring

Nodes continuously monitor connection health:


type ConnectionHealth struct {
    PeerID           [32]byte
    Connected        bool
    Latency          time.Duration
    LastPing         time.Time
    LastPong         time.Time
    BytesSent        uint64
    BytesReceived    uint64
    MessagesRelayed  uint64
    ErrorCount       int
}
 
func MonitorConnection(conn *Connection) {
    ticker := time.NewTicker(30 * time.Second)
    defer ticker.Stop()
 
    for range ticker.C {
        // Send ping
        conn.Health.LastPing = time.Now()
        conn.SendPing()
 
        // Wait for pong (with timeout)
        select {
        case <-conn.PongReceived:
            conn.Health.LastPong = time.Now()
            conn.Health.Latency = conn.Health.LastPong.Sub(conn.Health.LastPing)
        case <-time.After(5 * time.Second):
            conn.Health.ErrorCount++
            if conn.Health.ErrorCount > 3 {
                conn.Close()
            }
        }
    }
}

Health metrics tracked:

Metric	Threshold	Action
Ping latency	> 2 seconds	Mark degraded
Failed pings	> 3 consecutive	Disconnect
Error rate	> 10%	Reduce score
Bandwidth	< 1 KB/s sustained	Mark slow

Peer Churn Handling

The network handles nodes joining and leaving:


func HandlePeerDisconnect(peer *Connection) {
    // 1. Remove from active connections
    connectionManager.Remove(peer.NodeID)
 
    // 2. Keep in routing table (may reconnect)
    routingTable.MarkInactive(peer.NodeID)
 
    // 3. Find replacement if below minimum
    if connectionManager.NeedsMorePeers() {
        candidates := routingTable.GetCandidates(10)
        for _, candidate := range candidates {
            if connectionManager.Connect(candidate) {
                break
            }
        }
    }
 
    // 4. Trigger peer discovery if critically low
    if connectionManager.PeerCount() < MinPeers {
        triggerEmergencyDiscovery()
    }
}
 
func triggerEmergencyDiscovery() {
    // Accelerated discovery process
    // - Query all k-buckets
    // - Try cached peers
    // - Reconnect to bootstrap if needed
}

Churn resilience mechanisms:

Mechanism	Purpose
k-bucket redundancy	Multiple peers per distance range
Connection pooling	Reserve connections for stability
Cached peer list	Remember previously-seen peers
Periodic refresh	Keep routing table current
Lazy eviction	Don’t immediately remove disconnected peers

Security Considerations

Eclipse Attack Prevention

An eclipse attack isolates a node by surrounding it with attacker-controlled peers.

Mitigations implemented:

Mitigation	How It Works
IP diversity	Max 2 peers per /24 subnet
Outbound preference	Prioritize self-initiated connections
Anchor connections	Maintain long-term “anchor” peers
Fresh peer injection	Periodically add random new peers
Bootstrap rotation	Randomly reconnect to bootstraps


type EclipseProtection struct {
    MaxPeersPerSubnet    int           // 2
    MinOutboundRatio     float64       // 0.6 (60% outbound)
    AnchorPeerCount      int           // 3
    AnchorRotationPeriod time.Duration // 24 hours
}
 
func (ep *EclipseProtection) ValidateNewPeer(peer PeerInfo) bool {
    subnet := extractSubnet(peer.Address, 24)
 
    // Check subnet diversity
    if subnetCount[subnet] >= ep.MaxPeersPerSubnet {
        return false
    }
 
    // Verify peer isn't claiming suspicious node ID
    expectedID := SHA256(peer.PublicKey)
    if !bytes.Equal(expectedID, peer.NodeID[:]) {
        return false
    }
 
    return true
}

Sybil Attack Mitigation

Sybil attacks create many fake identities to gain disproportionate influence.

Mitigations:

Defense	Implementation
Cryptographic node IDs	`node_id = SHA256(public_key)`
Proof of work (optional)	Computational cost for ID generation
Rate limiting	Max new peers per time window
Reputation over time	Long-term nodes trusted more
Resource verification	Verify claimed bandwidth/storage


// Node ID generation with proof of work
func GenerateNodeID(publicKey []byte, difficulty int) (NodeID, Nonce) {
    for nonce := uint64(0); ; nonce++ {
        data := append(publicKey, uint64ToBytes(nonce)...)
        hash := SHA256(data)
 
        // Check if hash meets difficulty (leading zeros)
        if hasLeadingZeros(hash, difficulty) {
            return hash, nonce
        }
    }
}
 
// Verification
func VerifyNodeID(publicKey []byte, nodeID NodeID, nonce uint64, difficulty int) bool {
    data := append(publicKey, uint64ToBytes(nonce)...)
    hash := SHA256(data)
    return bytes.Equal(hash, nodeID) && hasLeadingZeros(hash, difficulty)
}

Additional Security Measures

Threat	Countermeasure
Man-in-the-middle	TLS with certificate pinning
Routing table poisoning	Verify node IDs cryptographically
Bootstrap poisoning	Multiple independent bootstraps
DNS poisoning	DNSSEC, fallback to hardcoded
Traffic analysis	Peer connection padding
Node impersonation	Signed announcements


// Signed node announcement
func CreateAnnouncement(privateKey ed25519.PrivateKey, info NodeInfo) *SignedAnnouncement {
    data := serializeNodeInfo(info)
    signature := ed25519.Sign(privateKey, data)
 
    return &SignedAnnouncement{
        Info:      info,
        Signature: signature,
        Timestamp: time.Now().Unix(),
    }
}
 
func VerifyAnnouncement(announcement *SignedAnnouncement) bool {
    // Check signature
    data := serializeNodeInfo(announcement.Info)
    if !ed25519.Verify(announcement.Info.PublicKey, data, announcement.Signature) {
        return false
    }
 
    // Check timestamp freshness (prevent replay)
    age := time.Since(time.Unix(announcement.Timestamp, 0))
    if age > 24*time.Hour {
        return false
    }
 
    return true
}

Configuration Reference

Node Discovery Settings


# discovery.yaml
discovery:
  # Bootstrap configuration
  bootstrap:
    nodes:
      - "boot1.zentalk.io:9000"
      - "boot2.zentalk.io:9000"
      - "boot3.zentalk.io:9000"
    retry_interval: "30s"
    max_retries: 10
 
  # DHT settings
  dht:
    bucket_size: 20
    alpha: 3
    refresh_interval: "1h"
    lookup_timeout: "5s"
 
  # Peer management
  peers:
    min_connections: 8
    target_connections: 25
    max_connections: 50
    max_per_subnet: 2
 
  # Gossip protocol
  gossip:
    interval: "30s"
    max_peers_per_exchange: 20
    ttl: 3
 
  # Security
  security:
    require_proof_of_work: false
    pow_difficulty: 16
    max_new_peers_per_hour: 50
    ban_threshold: 10.0

DHT and Kademlia - Detailed DHT protocol specification
Run a Node - Operating a Zentalk node
Architecture - System overview
Threat Model - Security analysis
Onion Routing - Relay node discovery