Routing Strategies

Prism's intelligent request routing system with consensus validation, hedging, scoring, and failover.

Overview

Prism implements a SmartRouter that automatically selects the best routing strategy based on the method and system state:

Request → SmartRouter
           │
           ├─ Critical method? ──────────→ CONSENSUS (3+ upstreams)
           │
           ├─ Scoring enabled? ──────────→ SCORING (best upstream)
           │
           ├─ Hedging enabled? ──────────→ HEDGING (parallel requests)
           │
           └─ Fallback ───────────────────→ LOAD BALANCER (round-robin)

Routing Priority

Consensus (if method requires validation)
Scoring (if enabled)
Hedging (if enabled and scoring disabled)
Load Balancer (fallback)

Consensus Validation

Multi-upstream validation for critical methods requiring data integrity guarantees.

How It Works

Send to multiple upstreams: Request sent to 3+ providers in parallel
Collect responses: Wait for responses (with timeout)
Compare results: Check if responses match
Consensus achieved?
- YES: Return agreed-upon response
- NO: Return error or fall back

Configuration

[consensus]
enabled = true
max_count = 3                    # Maximum upstreams to query
min_count = 2                    # Minimum upstreams for consensus
timeout_seconds = 10             # Timeout for consensus requests
dispute_behavior = "PreferBlockHeadLeader"  # How to resolve disputes
methods = [
  "eth_getBlockByNumber",
  "eth_getBlockByHash",
  "eth_getTransactionByHash",
  "eth_getTransactionReceipt",
  "eth_getLogs"
]

Option

Type

Default

Description

enabled

boolean

false

Enable consensus validation

max_count

integer

3

Maximum upstreams to query simultaneously

min_count

integer

2

Minimum upstreams required for consensus

timeout_seconds

integer

10

Timeout for consensus requests

dispute_behavior

string

"PreferBlockHeadLeader"

How to resolve disputes

methods

array

[eth_getBlockByNumber, ...]

Methods requiring consensus

Use Cases

When to use consensus:

Critical data integrity requirements
Financial applications (MEV, arbitrage)
Cross-chain bridge validation
Reorg detection

Trade-offs:

Higher latency: 3× upstream calls
Higher cost: More upstream API usage
Better reliability: Detect data inconsistencies
Reorg protection: Identify chain forks early

Example

[consensus]
enabled = true
min_responses = 3
critical_methods = ["eth_getBlockByNumber", "eth_getBlockByHash"]

# Need at least 3 upstreams
[[upstreams.providers]]
name = "alchemy"
https_url = "..."

[[upstreams.providers]]
name = "infura"
https_url = "..."

[[upstreams.providers]]
name = "quicknode"
https_url = "..."

Request: eth_getBlockByNumber("latest")

Behavior:

Send to alchemy, infura, quicknode in parallel
Receive responses:
- Alchemy: block 18500000
- Infura: block 18500000
- QuickNode: block 18499999 (lagging)
Consensus: 2/3 agree on 18500000
Return: block 18500000
Warning logged about QuickNode lag

Metrics

# Consensus requests
rpc_consensus_requests_total{result="success"} 1250
rpc_consensus_requests_total{result="failure"} 3

# Agreement rate
rpc_consensus_agreement_rate 0.996

# Consensus latency
rpc_consensus_duration_seconds_bucket{le="0.1"} 450
rpc_consensus_duration_seconds_bucket{le="0.5"} 1200

Hedging for Tail Latency

Parallel request execution to reduce P99 latency.

The Problem: Tail Latency

Scenario: You have 3 upstreams with these latencies:

Alchemy: P50=50ms, P95=120ms, P99=800ms
Infura: P50=45ms, P95=100ms, P99=600ms
QuickNode: P50=55ms, P95=150ms, P99=1200ms

Without hedging: Your P99 = ~800ms (slowest typical response)

With hedging: Your P99 = ~100-150ms (first response wins)

How It Works

Send primary request to best upstream (by score)
Wait initial_timeout_ms (e.g., 50ms)
If no response yet:
- Send hedge request to next-best upstream
Return first successful response
Cancel other requests

Configuration

[hedging]
enabled = true
latency_quantile = 0.95          # Use P95 latency for hedge delay calculation
min_delay_ms = 50                # Minimum delay before hedging
max_delay_ms = 2000              # Maximum delay before hedging
max_parallel = 2                 # Maximum parallel requests including primary

Option

Type

Default

Description

enabled

boolean

false

Enable request hedging

latency_quantile

float

0.95

Latency percentile for hedge trigger (0.0-1.0)

min_delay_ms

integer

50

Minimum delay before sending hedge

max_delay_ms

integer

2000

Maximum delay before sending hedge

max_parallel

integer

2

Maximum parallel requests including primary

Example Timeline

T=0ms:   Send request to Alchemy
T=50ms:  No response yet, send hedge to Infura
T=65ms:  Infura responds ✓ (return this)
T=120ms: Alchemy responds (ignored, already returned)

Result: 65ms latency instead of waiting for Alchemy's 120ms

Hedging Decision Tree

Request arrives
│
├─ Is hedging enabled? NO ──→ Single request
│  YES
│
├─ Get latency percentile for method
│  Example: eth_getLogs P95 = 150ms
│
├─ Calculate hedge delay
│  hedge_delay = P95 × 0.5 = 75ms
│
├─ Send primary request
│
├─ Wait hedge_delay
│
├─ Response received? YES ──→ Return
│  NO
│
├─ Send hedge request
│
└─ Return first response

Use Cases

Good for hedging:

High-variance latency (P99 >> P50)
Latency-sensitive applications
User-facing queries
Real-time data needs

Bad for hedging:

Cost-sensitive (2× API calls)
Low latency variance (P99 ≈ P50)
Batch processing
Background jobs

Metrics

# Hedged requests
rpc_hedged_requests_total{primary="alchemy",hedged="infura"} 340

# Hedge wins (which request finished first)
rpc_hedge_wins_total{upstream="alchemy",type="primary"} 180
rpc_hedge_wins_total{upstream="infura",type="hedged"} 160

# Hedge delay distribution
rpc_hedge_delay_ms_bucket{le="50"} 200
rpc_hedge_delay_ms_bucket{le="100"} 320

Scoring Algorithm

Multi-factor upstream ranking for intelligent selection.

Scoring Factors

Each upstream receives a composite score based on:

Latency (weight: 0.4): How fast it responds
Error Rate (weight: 0.3): How often it fails
Throttle Rate (weight: 0.2): How often it rate-limits
Block Lag (weight: 0.1): How far behind chain tip

Configuration

[scoring]
enabled = true
window_seconds = 1800            # 30 minute metric window
min_samples = 10                 # Minimum samples before scoring
max_block_lag = 5                # Max block lag before heavy penalty
top_n = 3                        # Consider top 3 upstreams

[scoring.weights]
latency = 8.0                    # Latency factor weight (highest priority)
error_rate = 4.0                 # Error rate factor weight
throttle_rate = 3.0              # Throttle rate factor weight
block_head_lag = 2.0             # Block head lag factor weight
total_requests = 1.0             # Total requests factor weight

Score Calculation

Score = (latency_factor × 0.4) +
        (error_rate_factor × 0.3) +
        (throttle_factor × 0.2) +
        (block_lag_factor × 0.1)

Factors (all normalized 0-1, higher is better):

Latency Factor:

latency_factor = 1 - (upstream_latency / worst_latency)

Error Rate Factor:

error_rate = errors / total_requests
error_rate_factor = 1 - error_rate

Throttle Factor:

throttle_rate = throttles / total_requests
throttle_factor = 1 - throttle_rate

Block Lag Factor:

block_lag = max_block_number - upstream_block_number
block_lag_factor = 1 - (block_lag / max_lag)

Example

Upstreams:

Alchemy: 50ms latency, 1% errors, 0% throttles, block 18500000
Infura: 45ms latency, 2% errors, 5% throttles, block 18499998
QuickNode: 60ms latency, 0.5% errors, 0% throttles, block 18500000

Scoring (assuming worst_latency=60ms, max_lag=2):

Alchemy:

latency_factor = 1 - (50/60) = 0.167
error_factor = 1 - 0.01 = 0.99
throttle_factor = 1 - 0 = 1.0
lag_factor = 1 - (0/2) = 1.0

score = (0.167 × 0.4) + (0.99 × 0.3) + (1.0 × 0.2) + (1.0 × 0.1)
      = 0.067 + 0.297 + 0.2 + 0.1
      = 0.664

Infura:

latency_factor = 1 - (45/60) = 0.25
error_factor = 1 - 0.02 = 0.98
throttle_factor = 1 - 0.05 = 0.95
lag_factor = 1 - (2/2) = 0.0

score = (0.25 × 0.4) + (0.98 × 0.3) + (0.95 × 0.2) + (0.0 × 0.1)
      = 0.1 + 0.294 + 0.19 + 0.0
      = 0.584

QuickNode:

latency_factor = 1 - (60/60) = 0.0
error_factor = 1 - 0.005 = 0.995
throttle_factor = 1 - 0 = 1.0
lag_factor = 1 - (0/2) = 1.0

score = (0.0 × 0.4) + (0.995 × 0.3) + (1.0 × 0.2) + (1.0 × 0.1)
      = 0.0 + 0.299 + 0.2 + 0.1
      = 0.599

Result: Alchemy (0.664) > QuickNode (0.599) > Infura (0.584)

Selection: Alchemy chosen for next request

Metrics

# Composite scores
rpc_upstream_composite_score{upstream="alchemy"} 0.664
rpc_upstream_composite_score{upstream="infura"} 0.584
rpc_upstream_composite_score{upstream="quicknode"} 0.599

# Individual factors
rpc_upstream_latency_factor{upstream="alchemy"} 0.167
rpc_upstream_error_rate_factor{upstream="alchemy"} 0.99
rpc_upstream_throttle_factor{upstream="alchemy"} 1.0
rpc_upstream_block_lag_factor{upstream="alchemy"} 1.0

Load Balancing

Fallback strategy when advanced routing is disabled.

Round-Robin

Simple round-robin distribution among healthy upstreams.

Request 1 → Upstream A
Request 2 → Upstream B
Request 3 → Upstream C
Request 4 → Upstream A
...

Weighted Round-Robin

Distribution based on upstream weights.

[[upstreams.providers]]
name = "primary"
weight = 3  # Gets 3× traffic

[[upstreams.providers]]
name = "backup"
weight = 1  # Gets 1× traffic

Result: Primary gets 75% traffic, backup gets 25%

Response-Time Based

Select upstream with best recent response time.

Recent latencies:
  Alchemy: [45ms, 50ms, 48ms, 52ms] → avg 48.75ms
  Infura: [60ms, 55ms, 58ms, 62ms] → avg 58.75ms

Next request → Alchemy (faster)

Circuit Breaker

Automatic isolation of failing upstreams.

State Machine

           ┌────────────┐
           │   CLOSED   │ (Normal operation)
           └─────┬──────┘
                 │ Failures ≥ threshold
                 ▼
           ┌────────────┐
           │    OPEN    │ (Isolated, requests fail fast)
           └─────┬──────┘
                 │ Timeout elapsed
                 ▼
           ┌────────────┐
           │ HALF-OPEN  │ (Trial request)
           └─────┬──────┘
                 │
    Success ◄────┴────► Failure
       │                   │
       ▼                   ▼
   [CLOSED]            [OPEN]

Configuration

[[upstreams.providers]]
name = "provider"
circuit_breaker_threshold = 5        # Open after 5 failures
circuit_breaker_timeout_seconds = 60 # Retry after 60 seconds

Behavior

CLOSED (normal):

All requests go through
Track consecutive failures
If failures ≥ threshold → OPEN

OPEN (isolated):

Requests fail immediately with CircuitBreakerOpen error
Wait timeout_seconds
After timeout → HALF-OPEN

HALF-OPEN (testing):

Allow one test request
Success → CLOSED (restore upstream)
Failure → OPEN (continue isolation)

Metrics

# Circuit breaker state (0=closed, 0.5=half-open, 1=open)
rpc_circuit_breaker_state{upstream="alchemy"} 0

# State transitions
rpc_circuit_breaker_transitions_total{upstream="alchemy",to_state="open"} 3
rpc_circuit_breaker_transitions_total{upstream="alchemy",to_state="closed"} 3

# Failure count
rpc_circuit_breaker_failure_count{upstream="alchemy"} 0

Routing Decision Flow

Complete request routing logic:

┌─────────────────────────────────────────────────────────┐
│                   SmartRouter.route()                    │
└───────────────────┬─────────────────────────────────────┘
                    │
                    ▼
         ┌──────────────────────┐
         │ Is method critical?   │
         └──┬──────────────┬────┘
           YES             NO
            │               │
            ▼               ▼
    ┌──────────────┐   ┌──────────────┐
    │  Consensus   │   │ Scoring      │
    │  enabled?    │   │ enabled?     │
    └──┬───────────┘   └──┬───────────┘
      YES    NO          YES    NO
       │      │           │      │
       ▼      │           ▼      ▼
    [CONSENSUS]│      [SCORING] ┌──────────────┐
       │       │           │    │   Hedging    │
       │       │           │    │   enabled?   │
       │       │           │    └──┬───────────┘
       │       │           │      YES    NO
       │       │           │       │      │
       │       │           │       ▼      ▼
       │       │           │   [HEDGING] [LOAD BALANCER]
       │       │           │       │            │
       │       └───────────┴───────┴────────────┘
       │                         │
       └─────────────────────────┘
                    │
                    ▼
            ┌──────────────┐
            │   Response   │
            └──────────────┘

Next: Learn about Authentication or Monitoring.

PreviousMonitoring & Observability NextChain State Synchronization

Last updated 3 months ago

Good evening

Routing Strategies

Table of Contents

Overview

Routing Priority

Consensus Validation

How It Works

Configuration

Use Cases

Example

Metrics

Hedging for Tail Latency

The Problem: Tail Latency

How It Works

Configuration

Example Timeline

Hedging Decision Tree

Use Cases

Metrics

Scoring Algorithm

Scoring Factors

Configuration

Score Calculation

Example

Metrics

Load Balancing

Round-Robin

Weighted Round-Robin

Response-Time Based

Circuit Breaker

State Machine

Configuration

Behavior

Metrics

Routing Decision Flow

Good evening

hashtagTable of Contents

hashtagOverview

hashtagRouting Priority

hashtagConsensus Validation

hashtagHow It Works

hashtagConfiguration

hashtagUse Cases

hashtagExample

hashtagMetrics

hashtagHedging for Tail Latency

hashtagThe Problem: Tail Latency

hashtagHow It Works

hashtagConfiguration

hashtagExample Timeline

hashtagHedging Decision Tree

hashtagUse Cases

hashtagMetrics

hashtagScoring Algorithm

hashtagScoring Factors

hashtagConfiguration

hashtagScore Calculation

hashtagExample

hashtagMetrics

hashtagLoad Balancing

hashtagRound-Robin

hashtagWeighted Round-Robin

hashtagResponse-Time Based

hashtagCircuit Breaker

hashtagState Machine

hashtagConfiguration

hashtagBehavior

hashtagMetrics

hashtagRouting Decision Flow

Table of Contents

Overview

Routing Priority

Consensus Validation

How It Works

Configuration

Use Cases

Example

Metrics

Hedging for Tail Latency

The Problem: Tail Latency

How It Works

Configuration

Example Timeline

Hedging Decision Tree

Use Cases

Metrics

Scoring Algorithm

Scoring Factors

Configuration

Score Calculation

Example

Metrics

Load Balancing

Round-Robin

Weighted Round-Robin

Response-Time Based

Circuit Breaker

State Machine

Configuration

Behavior

Metrics

Routing Decision Flow