Routing Strategies

Prism's intelligent request routing system with consensus validation, hedging, scoring, and failover.

Table of Contents


Overview

Prism implements a SmartRouter that automatically selects the best routing strategy based on the method and system state:

Request → SmartRouter

           ├─ Critical method? ──────────→ CONSENSUS (3+ upstreams)

           ├─ Scoring enabled? ──────────→ SCORING (best upstream)

           ├─ Hedging enabled? ──────────→ HEDGING (parallel requests)

           └─ Fallback ───────────────────→ LOAD BALANCER (round-robin)

Routing Priority

  1. Consensus (if method requires validation)

  2. Scoring (if enabled)

  3. Hedging (if enabled and scoring disabled)

  4. Load Balancer (fallback)


Consensus Validation

Multi-upstream validation for critical methods requiring data integrity guarantees.

How It Works

  1. Send to multiple upstreams: Request sent to 3+ providers in parallel

  2. Collect responses: Wait for responses (with timeout)

  3. Compare results: Check if responses match

  4. Consensus achieved?

    • YES: Return agreed-upon response

    • NO: Return error or fall back

Configuration

[consensus]
enabled = true
max_count = 3                    # Maximum upstreams to query
min_count = 2                    # Minimum upstreams for consensus
timeout_seconds = 10             # Timeout for consensus requests
dispute_behavior = "PreferBlockHeadLeader"  # How to resolve disputes
methods = [
  "eth_getBlockByNumber",
  "eth_getBlockByHash",
  "eth_getTransactionByHash",
  "eth_getTransactionReceipt",
  "eth_getLogs"
]
Option
Type
Default
Description

enabled

boolean

false

Enable consensus validation

max_count

integer

3

Maximum upstreams to query simultaneously

min_count

integer

2

Minimum upstreams required for consensus

timeout_seconds

integer

10

Timeout for consensus requests

dispute_behavior

string

"PreferBlockHeadLeader"

How to resolve disputes

methods

array

[eth_getBlockByNumber, ...]

Methods requiring consensus

Use Cases

When to use consensus:

  • Critical data integrity requirements

  • Financial applications (MEV, arbitrage)

  • Cross-chain bridge validation

  • Reorg detection

Trade-offs:

  • Higher latency: 3× upstream calls

  • Higher cost: More upstream API usage

  • Better reliability: Detect data inconsistencies

  • Reorg protection: Identify chain forks early

Example

[consensus]
enabled = true
min_responses = 3
critical_methods = ["eth_getBlockByNumber", "eth_getBlockByHash"]

# Need at least 3 upstreams
[[upstreams.providers]]
name = "alchemy"
https_url = "..."

[[upstreams.providers]]
name = "infura"
https_url = "..."

[[upstreams.providers]]
name = "quicknode"
https_url = "..."

Request: eth_getBlockByNumber("latest")

Behavior:

  1. Send to alchemy, infura, quicknode in parallel

  2. Receive responses:

    • Alchemy: block 18500000

    • Infura: block 18500000

    • QuickNode: block 18499999 (lagging)

  3. Consensus: 2/3 agree on 18500000

  4. Return: block 18500000

  5. Warning logged about QuickNode lag

Metrics

# Consensus requests
rpc_consensus_requests_total{result="success"} 1250
rpc_consensus_requests_total{result="failure"} 3

# Agreement rate
rpc_consensus_agreement_rate 0.996

# Consensus latency
rpc_consensus_duration_seconds_bucket{le="0.1"} 450
rpc_consensus_duration_seconds_bucket{le="0.5"} 1200

Hedging for Tail Latency

Parallel request execution to reduce P99 latency.

The Problem: Tail Latency

Scenario: You have 3 upstreams with these latencies:

  • Alchemy: P50=50ms, P95=120ms, P99=800ms

  • Infura: P50=45ms, P95=100ms, P99=600ms

  • QuickNode: P50=55ms, P95=150ms, P99=1200ms

Without hedging: Your P99 = ~800ms (slowest typical response)

With hedging: Your P99 = ~100-150ms (first response wins)

How It Works

  1. Send primary request to best upstream (by score)

  2. Wait initial_timeout_ms (e.g., 50ms)

  3. If no response yet:

    • Send hedge request to next-best upstream

  4. Return first successful response

  5. Cancel other requests

Configuration

[hedging]
enabled = true
latency_quantile = 0.95          # Use P95 latency for hedge delay calculation
min_delay_ms = 50                # Minimum delay before hedging
max_delay_ms = 2000              # Maximum delay before hedging
max_parallel = 2                 # Maximum parallel requests including primary
Option
Type
Default
Description

enabled

boolean

false

Enable request hedging

latency_quantile

float

0.95

Latency percentile for hedge trigger (0.0-1.0)

min_delay_ms

integer

50

Minimum delay before sending hedge

max_delay_ms

integer

2000

Maximum delay before sending hedge

max_parallel

integer

2

Maximum parallel requests including primary

Example Timeline

T=0ms:   Send request to Alchemy
T=50ms:  No response yet, send hedge to Infura
T=65ms:  Infura responds ✓ (return this)
T=120ms: Alchemy responds (ignored, already returned)

Result: 65ms latency instead of waiting for Alchemy's 120ms

Hedging Decision Tree

Request arrives

├─ Is hedging enabled? NO ──→ Single request
│  YES

├─ Get latency percentile for method
│  Example: eth_getLogs P95 = 150ms

├─ Calculate hedge delay
│  hedge_delay = P95 × 0.5 = 75ms

├─ Send primary request

├─ Wait hedge_delay

├─ Response received? YES ──→ Return
│  NO

├─ Send hedge request

└─ Return first response

Use Cases

Good for hedging:

  • High-variance latency (P99 >> P50)

  • Latency-sensitive applications

  • User-facing queries

  • Real-time data needs

Bad for hedging:

  • Cost-sensitive (2× API calls)

  • Low latency variance (P99 ≈ P50)

  • Batch processing

  • Background jobs

Metrics

# Hedged requests
rpc_hedged_requests_total{primary="alchemy",hedged="infura"} 340

# Hedge wins (which request finished first)
rpc_hedge_wins_total{upstream="alchemy",type="primary"} 180
rpc_hedge_wins_total{upstream="infura",type="hedged"} 160

# Hedge delay distribution
rpc_hedge_delay_ms_bucket{le="50"} 200
rpc_hedge_delay_ms_bucket{le="100"} 320

Scoring Algorithm

Multi-factor upstream ranking for intelligent selection.

Scoring Factors

Each upstream receives a composite score based on:

  1. Latency (weight: 0.4): How fast it responds

  2. Error Rate (weight: 0.3): How often it fails

  3. Throttle Rate (weight: 0.2): How often it rate-limits

  4. Block Lag (weight: 0.1): How far behind chain tip

Configuration

[scoring]
enabled = true
window_seconds = 1800            # 30 minute metric window
min_samples = 10                 # Minimum samples before scoring
max_block_lag = 5                # Max block lag before heavy penalty
top_n = 3                        # Consider top 3 upstreams

[scoring.weights]
latency = 8.0                    # Latency factor weight (highest priority)
error_rate = 4.0                 # Error rate factor weight
throttle_rate = 3.0              # Throttle rate factor weight
block_head_lag = 2.0             # Block head lag factor weight
total_requests = 1.0             # Total requests factor weight

Score Calculation

Score = (latency_factor × 0.4) +
        (error_rate_factor × 0.3) +
        (throttle_factor × 0.2) +
        (block_lag_factor × 0.1)

Factors (all normalized 0-1, higher is better):

Latency Factor:

latency_factor = 1 - (upstream_latency / worst_latency)

Error Rate Factor:

error_rate = errors / total_requests
error_rate_factor = 1 - error_rate

Throttle Factor:

throttle_rate = throttles / total_requests
throttle_factor = 1 - throttle_rate

Block Lag Factor:

block_lag = max_block_number - upstream_block_number
block_lag_factor = 1 - (block_lag / max_lag)

Example

Upstreams:

  • Alchemy: 50ms latency, 1% errors, 0% throttles, block 18500000

  • Infura: 45ms latency, 2% errors, 5% throttles, block 18499998

  • QuickNode: 60ms latency, 0.5% errors, 0% throttles, block 18500000

Scoring (assuming worst_latency=60ms, max_lag=2):

Alchemy:

latency_factor = 1 - (50/60) = 0.167
error_factor = 1 - 0.01 = 0.99
throttle_factor = 1 - 0 = 1.0
lag_factor = 1 - (0/2) = 1.0

score = (0.167 × 0.4) + (0.99 × 0.3) + (1.0 × 0.2) + (1.0 × 0.1)
      = 0.067 + 0.297 + 0.2 + 0.1
      = 0.664

Infura:

latency_factor = 1 - (45/60) = 0.25
error_factor = 1 - 0.02 = 0.98
throttle_factor = 1 - 0.05 = 0.95
lag_factor = 1 - (2/2) = 0.0

score = (0.25 × 0.4) + (0.98 × 0.3) + (0.95 × 0.2) + (0.0 × 0.1)
      = 0.1 + 0.294 + 0.19 + 0.0
      = 0.584

QuickNode:

latency_factor = 1 - (60/60) = 0.0
error_factor = 1 - 0.005 = 0.995
throttle_factor = 1 - 0 = 1.0
lag_factor = 1 - (0/2) = 1.0

score = (0.0 × 0.4) + (0.995 × 0.3) + (1.0 × 0.2) + (1.0 × 0.1)
      = 0.0 + 0.299 + 0.2 + 0.1
      = 0.599

Result: Alchemy (0.664) > QuickNode (0.599) > Infura (0.584)

Selection: Alchemy chosen for next request

Metrics

# Composite scores
rpc_upstream_composite_score{upstream="alchemy"} 0.664
rpc_upstream_composite_score{upstream="infura"} 0.584
rpc_upstream_composite_score{upstream="quicknode"} 0.599

# Individual factors
rpc_upstream_latency_factor{upstream="alchemy"} 0.167
rpc_upstream_error_rate_factor{upstream="alchemy"} 0.99
rpc_upstream_throttle_factor{upstream="alchemy"} 1.0
rpc_upstream_block_lag_factor{upstream="alchemy"} 1.0

Load Balancing

Fallback strategy when advanced routing is disabled.

Round-Robin

Simple round-robin distribution among healthy upstreams.

Request 1 → Upstream A
Request 2 → Upstream B
Request 3 → Upstream C
Request 4 → Upstream A
...

Weighted Round-Robin

Distribution based on upstream weights.

[[upstreams.providers]]
name = "primary"
weight = 3  # Gets 3× traffic

[[upstreams.providers]]
name = "backup"
weight = 1  # Gets 1× traffic

Result: Primary gets 75% traffic, backup gets 25%

Response-Time Based

Select upstream with best recent response time.

Recent latencies:
  Alchemy: [45ms, 50ms, 48ms, 52ms] → avg 48.75ms
  Infura: [60ms, 55ms, 58ms, 62ms] → avg 58.75ms

Next request → Alchemy (faster)

Circuit Breaker

Automatic isolation of failing upstreams.

State Machine

           ┌────────────┐
           │   CLOSED   │ (Normal operation)
           └─────┬──────┘
                 │ Failures ≥ threshold

           ┌────────────┐
           │    OPEN    │ (Isolated, requests fail fast)
           └─────┬──────┘
                 │ Timeout elapsed

           ┌────────────┐
           │ HALF-OPEN  │ (Trial request)
           └─────┬──────┘

    Success ◄────┴────► Failure
       │                   │
       ▼                   ▼
   [CLOSED]            [OPEN]

Configuration

[[upstreams.providers]]
name = "provider"
circuit_breaker_threshold = 5        # Open after 5 failures
circuit_breaker_timeout_seconds = 60 # Retry after 60 seconds

Behavior

CLOSED (normal):

  • All requests go through

  • Track consecutive failures

  • If failures ≥ threshold → OPEN

OPEN (isolated):

  • Requests fail immediately with CircuitBreakerOpen error

  • Wait timeout_seconds

  • After timeout → HALF-OPEN

HALF-OPEN (testing):

  • Allow one test request

  • Success → CLOSED (restore upstream)

  • Failure → OPEN (continue isolation)

Metrics

# Circuit breaker state (0=closed, 0.5=half-open, 1=open)
rpc_circuit_breaker_state{upstream="alchemy"} 0

# State transitions
rpc_circuit_breaker_transitions_total{upstream="alchemy",to_state="open"} 3
rpc_circuit_breaker_transitions_total{upstream="alchemy",to_state="closed"} 3

# Failure count
rpc_circuit_breaker_failure_count{upstream="alchemy"} 0

Routing Decision Flow

Complete request routing logic:

┌─────────────────────────────────────────────────────────┐
│                   SmartRouter.route()                    │
└───────────────────┬─────────────────────────────────────┘


         ┌──────────────────────┐
         │ Is method critical?   │
         └──┬──────────────┬────┘
           YES             NO
            │               │
            ▼               ▼
    ┌──────────────┐   ┌──────────────┐
    │  Consensus   │   │ Scoring      │
    │  enabled?    │   │ enabled?     │
    └──┬───────────┘   └──┬───────────┘
      YES    NO          YES    NO
       │      │           │      │
       ▼      │           ▼      ▼
    [CONSENSUS]│      [SCORING] ┌──────────────┐
       │       │           │    │   Hedging    │
       │       │           │    │   enabled?   │
       │       │           │    └──┬───────────┘
       │       │           │      YES    NO
       │       │           │       │      │
       │       │           │       ▼      ▼
       │       │           │   [HEDGING] [LOAD BALANCER]
       │       │           │       │            │
       │       └───────────┴───────┴────────────┘
       │                         │
       └─────────────────────────┘


            ┌──────────────┐
            │   Response   │
            └──────────────┘

Next: Learn about Authentication or Monitoring.

Last updated