Skip to Content
Node Discovery

Node Discovery

How nodes find and connect to each other in the Zentalk peer-to-peer network.


Overview

Node discovery is the process by which new nodes join the network and existing nodes maintain connections with peers. Zentalk uses a multi-layered discovery approach combining:

MechanismPurposeWhen Used
Bootstrap nodesInitial network entryFirst connection
DHT (Kademlia)Deterministic peer findingOngoing discovery
Gossip protocolPeer information exchangeContinuous
DNS seedsFallback discoveryBootstrap failure

Key goals:

  • Enable any node to join without central coordination
  • Maintain network connectivity despite churn
  • Resist attacks that manipulate peer discovery
  • Achieve geographic and topological diversity

Bootstrap Process

Hardcoded Bootstrap Nodes

Every Zentalk node ships with a list of hardcoded bootstrap nodes. These are well-known, stable nodes that serve as entry points to the network.

// Bootstrap node configuration var BootstrapNodes = []string{ "boot1.zentalk.io:9000", "boot2.zentalk.io:9000", "boot3.zentalk.io:9000", "boot-eu.zentalk.io:9000", "boot-asia.zentalk.io:9000", }

Bootstrap node requirements:

RequirementValueRationale
Uptime99.9%+Must be reliably available
Bandwidth100 Mbps+Handle many simultaneous connections
Geographic distribution3+ continentsReduce latency, resist regional censorship
Operator diversity3+ operatorsNo single point of control
Static IP/DNSRequiredNodes must be able to find them

Initial Connection Sequence

When a new node starts, it follows this sequence:

1. Load bootstrap node list from configuration 2. Shuffle list (randomize order) 3. For each bootstrap node (until success): a. Establish TLS connection b. Send HELLO message with own node ID c. Receive WELCOME with initial peer list d. Add peers to routing table 4. If all bootstraps fail: a. Try DNS seed discovery b. Try cached peers from previous session 5. Begin self-lookup to populate routing table

Connection handshake:

New Node Bootstrap Node | | |-------- TLS HELLO -------->| | | |<----- WELCOME + PEERS -----| | | |------- FIND_NODE(self) --->| | | |<---- K_CLOSEST_NODES ------| | |

Bootstrap Node Selection

Nodes select bootstrap nodes using weighted random selection:

type BootstrapNode struct { Address string Weight int // Higher = more likely to be selected LastSuccess time.Time FailCount int } func SelectBootstrap(nodes []BootstrapNode) *BootstrapNode { // Weight factors: // - Base weight from configuration // - Reduced by recent failures // - Increased by geographic proximity // - Randomized to distribute load }

Peer Discovery Mechanisms

DHT-Based Discovery (Kademlia)

The primary peer discovery mechanism uses Kademlia DHT lookups.

How DHT discovery works:

1. New node generates ID: node_id = SHA256(public_key) 2. Perform self-lookup: FIND_NODE(node_id) 3. Each response returns k closest nodes 4. Recursively query closer nodes 5. Result: Routing table filled with nearby peers

Lookup parameters:

ParameterValuePurpose
k (bucket size)20Nodes per distance bucket
alpha (parallelism)3Concurrent queries
Lookup timeout5 secondsPer-query timeout
Max hops20Prevent infinite loops

Bucket refresh for discovery:

func RefreshBuckets() { for i := 0; i < 256; i++ { bucket := routingTable.GetBucket(i) if bucket.LastLookup.Add(1 * time.Hour).Before(time.Now()) { // Generate random ID in bucket's range randomID := GenerateIDInBucket(i) // Lookup discovers new nodes in this distance range FindNode(randomID) } } }

Gossip-Based Peer Exchange

Nodes continuously exchange peer information with connected neighbors.

Peer exchange protocol:

Node A Node B | | |---- PEER_EXCHANGE_REQ ------>| | (my known peers) | | | |<--- PEER_EXCHANGE_RESP ------| | (peers you might want) | | |

Exchange message structure:

type PeerExchangeMessage struct { Peers []PeerInfo `json:"peers"` TTL int `json:"ttl"` // Prevent infinite propagation } type PeerInfo struct { NodeID [32]byte `json:"node_id"` Address string `json:"address"` PublicKey []byte `json:"public_key"` Capabilities uint64 `json:"capabilities"` LastSeen time.Time `json:"last_seen"` Score float64 `json:"score"` }

Gossip frequency:

ConditionExchange Interval
Normal operationEvery 30 seconds
Low peer count (<10)Every 10 seconds
Network partition detectedEvery 5 seconds
Stable, well-connectedEvery 60 seconds

DNS Seed Discovery

Fallback mechanism when bootstrap nodes are unreachable.

DNS seed records:

seeds.zentalk.io TXT "node1.zentalk.io:9000" seeds.zentalk.io TXT "node2.zentalk.io:9000" seeds.zentalk.io TXT "192.0.2.1:9000"

DNS resolution process:

func DNSDiscovery() ([]string, error) { records, err := net.LookupTXT("seeds.zentalk.io") if err != nil { return nil, err } var peers []string for _, record := range records { // Validate format: host:port if isValidPeerAddress(record) { peers = append(peers, record) } } return peers, nil }

Advantages of DNS seeds:

  • Works even if hardcoded IPs change
  • Can be updated without client software changes
  • Distributed via global DNS infrastructure
  • Harder to block than specific IPs

Peer Scoring and Reputation

Scoring Algorithm

Each node maintains scores for connected peers to prioritize reliable connections.

type PeerScore struct { NodeID [32]byte BaseScore float64 // Starting score: 50.0 UptimeScore float64 // 0-25 points LatencyScore float64 // 0-20 points BandwidthScore float64 // 0-15 points BehaviorScore float64 // -50 to +40 points LastUpdated time.Time } func (s *PeerScore) Total() float64 { total := s.BaseScore + s.UptimeScore + s.LatencyScore + s.BandwidthScore + s.BehaviorScore return math.Max(0, math.Min(100, total)) }

Scoring Factors

FactorWeightMeasurement
Uptime0-25Continuous connection duration
Latency0-20Average response time
Bandwidth0-15Throughput capacity
Behavior-50 to +40Protocol compliance, helpfulness

Uptime scoring:

func CalculateUptimeScore(connectionDuration time.Duration) float64 { hours := connectionDuration.Hours() switch { case hours < 1: return 0 case hours < 24: return 5 case hours < 168: // 1 week return 10 case hours < 720: // 1 month return 20 default: return 25 } }

Latency scoring:

func CalculateLatencyScore(avgLatency time.Duration) float64 { ms := avgLatency.Milliseconds() switch { case ms < 50: return 20 case ms < 100: return 15 case ms < 200: return 10 case ms < 500: return 5 default: return 0 } }

Behavior scoring:

BehaviorScore Impact
Valid responses+0.1 per response
Forwarded messages successfully+0.5 per message
Failed to respond-1 per timeout
Invalid message format-5 per violation
Suspected spam/flood-10 per incident
Eclipse attack behavior-50 (immediate)

Score Decay

Scores decay over time to ensure fresh evaluation:

func ApplyScoreDecay(score *PeerScore) { timeSinceUpdate := time.Since(score.LastUpdated) // Decay 1% per hour of inactivity decayFactor := math.Pow(0.99, timeSinceUpdate.Hours()) // Apply decay to volatile components score.BehaviorScore *= decayFactor score.LatencyScore *= decayFactor // Uptime score doesn't decay (measured directly) score.LastUpdated = time.Now() }

Banning and Blocking

Peers exhibiting malicious behavior are banned:

type BanPolicy struct { ScoreThreshold float64 // Below this = banned BanDuration time.Duration // How long the ban lasts MaxViolations int // Violations before permanent ban } var DefaultBanPolicy = BanPolicy{ ScoreThreshold: 10.0, BanDuration: 24 * time.Hour, MaxViolations: 3, } func CheckBan(peer *PeerScore) BanDecision { if peer.Total() < DefaultBanPolicy.ScoreThreshold { return BanDecision{ Banned: true, Duration: DefaultBanPolicy.BanDuration, Reason: "Score below threshold", } } return BanDecision{Banned: false} }

Ban escalation:

Violation CountBan Duration
11 hour
224 hours
3+7 days
Severe (attack)Permanent

Network Topology Formation

Target Peer Count

Nodes maintain a target number of connections for optimal operation:

Connection TypeMinTargetMax
Total peers82550
Outbound81525
Inbound01025
Bootstrap123

Connection management:

type ConnectionManager struct { MinPeers int TargetPeers int MaxPeers int outbound map[NodeID]*Connection inbound map[NodeID]*Connection } func (cm *ConnectionManager) NeedsMorePeers() bool { return len(cm.outbound) + len(cm.inbound) < cm.TargetPeers } func (cm *ConnectionManager) CanAcceptInbound() bool { return len(cm.inbound) < cm.MaxPeers - cm.MinPeers }

Peer Selection Strategy

When selecting new peers, nodes balance multiple factors:

type PeerSelector struct { scoreWeight float64 // 0.4 - prefer high-scoring peers diversityWeight float64 // 0.3 - prefer diverse network positions latencyWeight float64 // 0.2 - prefer low-latency peers randomWeight float64 // 0.1 - some randomness } func (ps *PeerSelector) SelectPeers(candidates []PeerInfo, count int) []PeerInfo { // Score each candidate scored := make([]ScoredCandidate, len(candidates)) for i, c := range candidates { scored[i] = ScoredCandidate{ Peer: c, Score: ps.calculateScore(c), } } // Sort by score and select top candidates sort.Slice(scored, func(i, j int) bool { return scored[i].Score > scored[j].Score }) return scored[:count] }

Geographic Diversity

Nodes actively seek connections to geographically diverse peers:

type GeoDistribution struct { Regions map[string]int // Region -> connection count Target map[string]int // Region -> target count } func (gd *GeoDistribution) NeedsRegion(region string) bool { current := gd.Regions[region] target := gd.Target[region] return current < target } // Target distribution example var TargetGeoDistribution = map[string]int{ "europe": 5, "north-america": 5, "asia": 5, "south-america": 3, "oceania": 2, "africa": 2, }

Why geographic diversity matters:

  • Reduces latency for global message delivery
  • Resists regional network partitions
  • Prevents geographic censorship
  • Improves network resilience

Node Capabilities Advertisement

Capability Flags

Nodes advertise their capabilities using a bitfield:

type Capabilities uint64 const ( CapRelay Capabilities = 1 << 0 // Can relay messages CapStorage Capabilities = 1 << 1 // Offers mesh storage CapBootstrap Capabilities = 1 << 2 // Can serve as bootstrap CapGuardRelay Capabilities = 1 << 3 // Guard relay for 3-hop relay routing CapMiddleRelay Capabilities = 1 << 4 // Middle relay CapExitRelay Capabilities = 1 << 5 // Exit relay CapHighBandwidth Capabilities = 1 << 6 // High bandwidth available CapIPv6 Capabilities = 1 << 7 // IPv6 support CapWebSocket Capabilities = 1 << 8 // WebSocket support CapValidator Capabilities = 1 << 9 // Network validator )

Capability advertisement message:

type NodeAnnouncement struct { NodeID [32]byte `json:"node_id"` PublicKey []byte `json:"public_key"` Address string `json:"address"` Capabilities Capabilities `json:"capabilities"` Version string `json:"version"` Timestamp int64 `json:"timestamp"` Signature []byte `json:"signature"` }

Service Discovery

Nodes can query for peers with specific capabilities:

// Find nodes with specific capabilities func FindCapableNodes(required Capabilities) []PeerInfo { var results []PeerInfo for _, peer := range routingTable.AllPeers() { if peer.Capabilities & required == required { results = append(results, peer) } } return results } // Example: Find guard relays guardRelays := FindCapableNodes(CapGuardRelay | CapRelay)

DHT-based service registry:

// Announce service availability func AnnounceService(serviceType string, nodeInfo NodeAnnouncement) { key := SHA256("service:" + serviceType) Store(key, nodeInfo) } // Discover service providers func DiscoverService(serviceType string) []NodeAnnouncement { key := SHA256("service:" + serviceType) return FindValue(key) }

Joining the Network

Complete Bootstrap Sequence

Detailed step-by-step process for a new node joining:

PHASE 1: Key Generation (Local) ================================ 1. Generate Ed25519 keypair 2. Compute node_id = SHA256(public_key || timestamp) 3. Initialize empty routing table PHASE 2: Bootstrap Connection ============================== 4. Load bootstrap node list 5. Select bootstrap node (weighted random) 6. Establish TLS connection to bootstrap 7. Send HELLO with node_id and public_key 8. Receive WELCOME with k initial peers 9. Add bootstrap to routing table PHASE 3: Self-Lookup ===================== 10. Query bootstrap: FIND_NODE(own_node_id) 11. Receive k closest nodes to self 12. Query those nodes: FIND_NODE(own_node_id) 13. Continue until no closer nodes found 14. Result: Know all nodes closest to self PHASE 4: Routing Table Population ================================== 15. For each bucket i (0 to 255): a. Generate random ID in bucket's range b. Perform FIND_NODE(random_id) c. Add discovered nodes to routing table 16. Result: Know nodes at all distance ranges PHASE 5: Peer Connection ========================= 17. Select outbound peers from routing table 18. Establish persistent connections 19. Begin peer exchange protocol 20. Start message relay operations PHASE 6: Network Participation =============================== 21. Respond to incoming FIND_NODE queries 22. Participate in gossip protocol 23. Announce capabilities 24. Begin normal operation

Routing Table Population

func PopulateRoutingTable(bootstrap *Connection) error { // Phase 1: Self-lookup selfLookup := NewLookup(ownNodeID) selfLookup.Query(bootstrap) selfLookup.IterateUntilConverged() // Phase 2: Bucket refresh for i := 0; i < 256; i++ { if routingTable.Bucket(i).IsEmpty() { randomID := GenerateIDInBucket(i) lookup := NewLookup(randomID) lookup.IterateUntilConverged() } } return nil }

Connection Establishment

type ConnectionState int const ( StateDisconnected ConnectionState = iota StateConnecting StateHandshaking StateConnected ) func EstablishConnection(peer PeerInfo) (*Connection, error) { conn := &Connection{ PeerID: peer.NodeID, State: StateConnecting, } // 1. TCP connection tcpConn, err := net.DialTimeout("tcp", peer.Address, 10*time.Second) if err != nil { return nil, err } // 2. TLS handshake conn.State = StateHandshaking tlsConn := tls.Client(tcpConn, tlsConfig) if err := tlsConn.Handshake(); err != nil { return nil, err } // 3. Protocol handshake if err := protocolHandshake(tlsConn, peer); err != nil { return nil, err } conn.State = StateConnected return conn, nil }

Network Health Metrics

Connection Monitoring

Nodes continuously monitor connection health:

type ConnectionHealth struct { PeerID [32]byte Connected bool Latency time.Duration LastPing time.Time LastPong time.Time BytesSent uint64 BytesReceived uint64 MessagesRelayed uint64 ErrorCount int } func MonitorConnection(conn *Connection) { ticker := time.NewTicker(30 * time.Second) defer ticker.Stop() for range ticker.C { // Send ping conn.Health.LastPing = time.Now() conn.SendPing() // Wait for pong (with timeout) select { case <-conn.PongReceived: conn.Health.LastPong = time.Now() conn.Health.Latency = conn.Health.LastPong.Sub(conn.Health.LastPing) case <-time.After(5 * time.Second): conn.Health.ErrorCount++ if conn.Health.ErrorCount > 3 { conn.Close() } } } }

Health metrics tracked:

MetricThresholdAction
Ping latency> 2 secondsMark degraded
Failed pings> 3 consecutiveDisconnect
Error rate> 10%Reduce score
Bandwidth< 1 KB/s sustainedMark slow

Peer Churn Handling

The network handles nodes joining and leaving:

func HandlePeerDisconnect(peer *Connection) { // 1. Remove from active connections connectionManager.Remove(peer.NodeID) // 2. Keep in routing table (may reconnect) routingTable.MarkInactive(peer.NodeID) // 3. Find replacement if below minimum if connectionManager.NeedsMorePeers() { candidates := routingTable.GetCandidates(10) for _, candidate := range candidates { if connectionManager.Connect(candidate) { break } } } // 4. Trigger peer discovery if critically low if connectionManager.PeerCount() < MinPeers { triggerEmergencyDiscovery() } } func triggerEmergencyDiscovery() { // Accelerated discovery process // - Query all k-buckets // - Try cached peers // - Reconnect to bootstrap if needed }

Churn resilience mechanisms:

MechanismPurpose
k-bucket redundancyMultiple peers per distance range
Connection poolingReserve connections for stability
Cached peer listRemember previously-seen peers
Periodic refreshKeep routing table current
Lazy evictionDon’t immediately remove disconnected peers

Security Considerations

Eclipse Attack Prevention

An eclipse attack isolates a node by surrounding it with attacker-controlled peers.

Mitigations implemented:

MitigationHow It Works
IP diversityMax 2 peers per /24 subnet
Outbound preferencePrioritize self-initiated connections
Anchor connectionsMaintain long-term “anchor” peers
Fresh peer injectionPeriodically add random new peers
Bootstrap rotationRandomly reconnect to bootstraps
type EclipseProtection struct { MaxPeersPerSubnet int // 2 MinOutboundRatio float64 // 0.6 (60% outbound) AnchorPeerCount int // 3 AnchorRotationPeriod time.Duration // 24 hours } func (ep *EclipseProtection) ValidateNewPeer(peer PeerInfo) bool { subnet := extractSubnet(peer.Address, 24) // Check subnet diversity if subnetCount[subnet] >= ep.MaxPeersPerSubnet { return false } // Verify peer isn't claiming suspicious node ID expectedID := SHA256(peer.PublicKey) if !bytes.Equal(expectedID, peer.NodeID[:]) { return false } return true }

Sybil Attack Mitigation

Sybil attacks create many fake identities to gain disproportionate influence.

Mitigations:

DefenseImplementation
Cryptographic node IDsnode_id = SHA256(public_key)
Proof of work (optional)Computational cost for ID generation
Rate limitingMax new peers per time window
Reputation over timeLong-term nodes trusted more
Resource verificationVerify claimed bandwidth/storage
// Node ID generation with proof of work func GenerateNodeID(publicKey []byte, difficulty int) (NodeID, Nonce) { for nonce := uint64(0); ; nonce++ { data := append(publicKey, uint64ToBytes(nonce)...) hash := SHA256(data) // Check if hash meets difficulty (leading zeros) if hasLeadingZeros(hash, difficulty) { return hash, nonce } } } // Verification func VerifyNodeID(publicKey []byte, nodeID NodeID, nonce uint64, difficulty int) bool { data := append(publicKey, uint64ToBytes(nonce)...) hash := SHA256(data) return bytes.Equal(hash, nodeID) && hasLeadingZeros(hash, difficulty) }

Additional Security Measures

ThreatCountermeasure
Man-in-the-middleTLS with certificate pinning
Routing table poisoningVerify node IDs cryptographically
Bootstrap poisoningMultiple independent bootstraps
DNS poisoningDNSSEC, fallback to hardcoded
Traffic analysisPeer connection padding
Node impersonationSigned announcements
// Signed node announcement func CreateAnnouncement(privateKey ed25519.PrivateKey, info NodeInfo) *SignedAnnouncement { data := serializeNodeInfo(info) signature := ed25519.Sign(privateKey, data) return &SignedAnnouncement{ Info: info, Signature: signature, Timestamp: time.Now().Unix(), } } func VerifyAnnouncement(announcement *SignedAnnouncement) bool { // Check signature data := serializeNodeInfo(announcement.Info) if !ed25519.Verify(announcement.Info.PublicKey, data, announcement.Signature) { return false } // Check timestamp freshness (prevent replay) age := time.Since(time.Unix(announcement.Timestamp, 0)) if age > 24*time.Hour { return false } return true }

Configuration Reference

Node Discovery Settings

# discovery.yaml discovery: # Bootstrap configuration bootstrap: nodes: - "boot1.zentalk.io:9000" - "boot2.zentalk.io:9000" - "boot3.zentalk.io:9000" retry_interval: "30s" max_retries: 10 # DHT settings dht: bucket_size: 20 alpha: 3 refresh_interval: "1h" lookup_timeout: "5s" # Peer management peers: min_connections: 8 target_connections: 25 max_connections: 50 max_per_subnet: 2 # Gossip protocol gossip: interval: "30s" max_peers_per_exchange: 20 ttl: 3 # Security security: require_proof_of_work: false pow_difficulty: 16 max_new_peers_per_hour: 50 ban_threshold: 10.0

Last updated on