Routing Strategies
Prism's intelligent request routing system with consensus validation, hedging, scoring, and failover.
Table of Contents
Overview
Prism implements a SmartRouter that automatically selects the best routing strategy based on the method and system state:
Request → SmartRouter
│
├─ Critical method? ──────────→ CONSENSUS (3+ upstreams)
│
├─ Scoring enabled? ──────────→ SCORING (best upstream)
│
├─ Hedging enabled? ──────────→ HEDGING (parallel requests)
│
└─ Fallback ───────────────────→ LOAD BALANCER (round-robin)Routing Priority
Consensus (if method requires validation)
Scoring (if enabled)
Hedging (if enabled and scoring disabled)
Load Balancer (fallback)
Consensus Validation
Multi-upstream validation for critical methods requiring data integrity guarantees.
How It Works
Send to multiple upstreams: Request sent to 3+ providers in parallel
Collect responses: Wait for responses (with timeout)
Compare results: Check if responses match
Consensus achieved?
YES: Return agreed-upon response
NO: Return error or fall back
Configuration
[consensus]
enabled = true
max_count = 3 # Maximum upstreams to query
min_count = 2 # Minimum upstreams for consensus
timeout_seconds = 10 # Timeout for consensus requests
dispute_behavior = "PreferBlockHeadLeader" # How to resolve disputes
methods = [
"eth_getBlockByNumber",
"eth_getBlockByHash",
"eth_getTransactionByHash",
"eth_getTransactionReceipt",
"eth_getLogs"
]enabled
boolean
false
Enable consensus validation
max_count
integer
3
Maximum upstreams to query simultaneously
min_count
integer
2
Minimum upstreams required for consensus
timeout_seconds
integer
10
Timeout for consensus requests
dispute_behavior
string
"PreferBlockHeadLeader"
How to resolve disputes
methods
array
[eth_getBlockByNumber, ...]
Methods requiring consensus
Use Cases
When to use consensus:
Critical data integrity requirements
Financial applications (MEV, arbitrage)
Cross-chain bridge validation
Reorg detection
Trade-offs:
Higher latency: 3× upstream calls
Higher cost: More upstream API usage
Better reliability: Detect data inconsistencies
Reorg protection: Identify chain forks early
Example
[consensus]
enabled = true
min_responses = 3
critical_methods = ["eth_getBlockByNumber", "eth_getBlockByHash"]
# Need at least 3 upstreams
[[upstreams.providers]]
name = "alchemy"
https_url = "..."
[[upstreams.providers]]
name = "infura"
https_url = "..."
[[upstreams.providers]]
name = "quicknode"
https_url = "..."Request: eth_getBlockByNumber("latest")
Behavior:
Send to alchemy, infura, quicknode in parallel
Receive responses:
Alchemy: block 18500000
Infura: block 18500000
QuickNode: block 18499999 (lagging)
Consensus: 2/3 agree on 18500000
Return: block 18500000
Warning logged about QuickNode lag
Metrics
# Consensus requests
rpc_consensus_requests_total{result="success"} 1250
rpc_consensus_requests_total{result="failure"} 3
# Agreement rate
rpc_consensus_agreement_rate 0.996
# Consensus latency
rpc_consensus_duration_seconds_bucket{le="0.1"} 450
rpc_consensus_duration_seconds_bucket{le="0.5"} 1200Hedging for Tail Latency
Parallel request execution to reduce P99 latency.
The Problem: Tail Latency
Scenario: You have 3 upstreams with these latencies:
Alchemy: P50=50ms, P95=120ms, P99=800ms
Infura: P50=45ms, P95=100ms, P99=600ms
QuickNode: P50=55ms, P95=150ms, P99=1200ms
Without hedging: Your P99 = ~800ms (slowest typical response)
With hedging: Your P99 = ~100-150ms (first response wins)
How It Works
Send primary request to best upstream (by score)
Wait
initial_timeout_ms(e.g., 50ms)If no response yet:
Send hedge request to next-best upstream
Return first successful response
Cancel other requests
Configuration
[hedging]
enabled = true
latency_quantile = 0.95 # Use P95 latency for hedge delay calculation
min_delay_ms = 50 # Minimum delay before hedging
max_delay_ms = 2000 # Maximum delay before hedging
max_parallel = 2 # Maximum parallel requests including primaryenabled
boolean
false
Enable request hedging
latency_quantile
float
0.95
Latency percentile for hedge trigger (0.0-1.0)
min_delay_ms
integer
50
Minimum delay before sending hedge
max_delay_ms
integer
2000
Maximum delay before sending hedge
max_parallel
integer
2
Maximum parallel requests including primary
Example Timeline
T=0ms: Send request to Alchemy
T=50ms: No response yet, send hedge to Infura
T=65ms: Infura responds ✓ (return this)
T=120ms: Alchemy responds (ignored, already returned)Result: 65ms latency instead of waiting for Alchemy's 120ms
Hedging Decision Tree
Request arrives
│
├─ Is hedging enabled? NO ──→ Single request
│ YES
│
├─ Get latency percentile for method
│ Example: eth_getLogs P95 = 150ms
│
├─ Calculate hedge delay
│ hedge_delay = P95 × 0.5 = 75ms
│
├─ Send primary request
│
├─ Wait hedge_delay
│
├─ Response received? YES ──→ Return
│ NO
│
├─ Send hedge request
│
└─ Return first responseUse Cases
Good for hedging:
High-variance latency (P99 >> P50)
Latency-sensitive applications
User-facing queries
Real-time data needs
Bad for hedging:
Cost-sensitive (2× API calls)
Low latency variance (P99 ≈ P50)
Batch processing
Background jobs
Metrics
# Hedged requests
rpc_hedged_requests_total{primary="alchemy",hedged="infura"} 340
# Hedge wins (which request finished first)
rpc_hedge_wins_total{upstream="alchemy",type="primary"} 180
rpc_hedge_wins_total{upstream="infura",type="hedged"} 160
# Hedge delay distribution
rpc_hedge_delay_ms_bucket{le="50"} 200
rpc_hedge_delay_ms_bucket{le="100"} 320Scoring Algorithm
Multi-factor upstream ranking for intelligent selection.
Scoring Factors
Each upstream receives a composite score based on:
Latency (weight: 0.4): How fast it responds
Error Rate (weight: 0.3): How often it fails
Throttle Rate (weight: 0.2): How often it rate-limits
Block Lag (weight: 0.1): How far behind chain tip
Configuration
[scoring]
enabled = true
window_seconds = 1800 # 30 minute metric window
min_samples = 10 # Minimum samples before scoring
max_block_lag = 5 # Max block lag before heavy penalty
top_n = 3 # Consider top 3 upstreams
[scoring.weights]
latency = 8.0 # Latency factor weight (highest priority)
error_rate = 4.0 # Error rate factor weight
throttle_rate = 3.0 # Throttle rate factor weight
block_head_lag = 2.0 # Block head lag factor weight
total_requests = 1.0 # Total requests factor weightScore Calculation
Score = (latency_factor × 0.4) +
(error_rate_factor × 0.3) +
(throttle_factor × 0.2) +
(block_lag_factor × 0.1)Factors (all normalized 0-1, higher is better):
Latency Factor:
latency_factor = 1 - (upstream_latency / worst_latency)Error Rate Factor:
error_rate = errors / total_requests
error_rate_factor = 1 - error_rateThrottle Factor:
throttle_rate = throttles / total_requests
throttle_factor = 1 - throttle_rateBlock Lag Factor:
block_lag = max_block_number - upstream_block_number
block_lag_factor = 1 - (block_lag / max_lag)Example
Upstreams:
Alchemy: 50ms latency, 1% errors, 0% throttles, block 18500000
Infura: 45ms latency, 2% errors, 5% throttles, block 18499998
QuickNode: 60ms latency, 0.5% errors, 0% throttles, block 18500000
Scoring (assuming worst_latency=60ms, max_lag=2):
Alchemy:
latency_factor = 1 - (50/60) = 0.167
error_factor = 1 - 0.01 = 0.99
throttle_factor = 1 - 0 = 1.0
lag_factor = 1 - (0/2) = 1.0
score = (0.167 × 0.4) + (0.99 × 0.3) + (1.0 × 0.2) + (1.0 × 0.1)
= 0.067 + 0.297 + 0.2 + 0.1
= 0.664Infura:
latency_factor = 1 - (45/60) = 0.25
error_factor = 1 - 0.02 = 0.98
throttle_factor = 1 - 0.05 = 0.95
lag_factor = 1 - (2/2) = 0.0
score = (0.25 × 0.4) + (0.98 × 0.3) + (0.95 × 0.2) + (0.0 × 0.1)
= 0.1 + 0.294 + 0.19 + 0.0
= 0.584QuickNode:
latency_factor = 1 - (60/60) = 0.0
error_factor = 1 - 0.005 = 0.995
throttle_factor = 1 - 0 = 1.0
lag_factor = 1 - (0/2) = 1.0
score = (0.0 × 0.4) + (0.995 × 0.3) + (1.0 × 0.2) + (1.0 × 0.1)
= 0.0 + 0.299 + 0.2 + 0.1
= 0.599Result: Alchemy (0.664) > QuickNode (0.599) > Infura (0.584)
Selection: Alchemy chosen for next request
Metrics
# Composite scores
rpc_upstream_composite_score{upstream="alchemy"} 0.664
rpc_upstream_composite_score{upstream="infura"} 0.584
rpc_upstream_composite_score{upstream="quicknode"} 0.599
# Individual factors
rpc_upstream_latency_factor{upstream="alchemy"} 0.167
rpc_upstream_error_rate_factor{upstream="alchemy"} 0.99
rpc_upstream_throttle_factor{upstream="alchemy"} 1.0
rpc_upstream_block_lag_factor{upstream="alchemy"} 1.0Load Balancing
Fallback strategy when advanced routing is disabled.
Round-Robin
Simple round-robin distribution among healthy upstreams.
Request 1 → Upstream A
Request 2 → Upstream B
Request 3 → Upstream C
Request 4 → Upstream A
...Weighted Round-Robin
Distribution based on upstream weights.
[[upstreams.providers]]
name = "primary"
weight = 3 # Gets 3× traffic
[[upstreams.providers]]
name = "backup"
weight = 1 # Gets 1× trafficResult: Primary gets 75% traffic, backup gets 25%
Response-Time Based
Select upstream with best recent response time.
Recent latencies:
Alchemy: [45ms, 50ms, 48ms, 52ms] → avg 48.75ms
Infura: [60ms, 55ms, 58ms, 62ms] → avg 58.75ms
Next request → Alchemy (faster)Circuit Breaker
Automatic isolation of failing upstreams.
State Machine
┌────────────┐
│ CLOSED │ (Normal operation)
└─────┬──────┘
│ Failures ≥ threshold
▼
┌────────────┐
│ OPEN │ (Isolated, requests fail fast)
└─────┬──────┘
│ Timeout elapsed
▼
┌────────────┐
│ HALF-OPEN │ (Trial request)
└─────┬──────┘
│
Success ◄────┴────► Failure
│ │
▼ ▼
[CLOSED] [OPEN]Configuration
[[upstreams.providers]]
name = "provider"
circuit_breaker_threshold = 5 # Open after 5 failures
circuit_breaker_timeout_seconds = 60 # Retry after 60 secondsBehavior
CLOSED (normal):
All requests go through
Track consecutive failures
If failures ≥ threshold → OPEN
OPEN (isolated):
Requests fail immediately with
CircuitBreakerOpenerrorWait
timeout_secondsAfter timeout → HALF-OPEN
HALF-OPEN (testing):
Allow one test request
Success → CLOSED (restore upstream)
Failure → OPEN (continue isolation)
Metrics
# Circuit breaker state (0=closed, 0.5=half-open, 1=open)
rpc_circuit_breaker_state{upstream="alchemy"} 0
# State transitions
rpc_circuit_breaker_transitions_total{upstream="alchemy",to_state="open"} 3
rpc_circuit_breaker_transitions_total{upstream="alchemy",to_state="closed"} 3
# Failure count
rpc_circuit_breaker_failure_count{upstream="alchemy"} 0Routing Decision Flow
Complete request routing logic:
┌─────────────────────────────────────────────────────────┐
│ SmartRouter.route() │
└───────────────────┬─────────────────────────────────────┘
│
▼
┌──────────────────────┐
│ Is method critical? │
└──┬──────────────┬────┘
YES NO
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Consensus │ │ Scoring │
│ enabled? │ │ enabled? │
└──┬───────────┘ └──┬───────────┘
YES NO YES NO
│ │ │ │
▼ │ ▼ ▼
[CONSENSUS]│ [SCORING] ┌──────────────┐
│ │ │ │ Hedging │
│ │ │ │ enabled? │
│ │ │ └──┬───────────┘
│ │ │ YES NO
│ │ │ │ │
│ │ │ ▼ ▼
│ │ │ [HEDGING] [LOAD BALANCER]
│ │ │ │ │
│ └───────────┴───────┴────────────┘
│ │
└─────────────────────────┘
│
▼
┌──────────────┐
│ Response │
└──────────────┘Next: Learn about Authentication or Monitoring.
Last updated