Troubleshooting & FAQ
Comprehensive troubleshooting guide for diagnosing and resolving common Prism issues.
Table of Contents
Connection & Upstream Issues
No Healthy Upstreams Available
Error: "No healthy upstreams available" (Error code: -32050)
Symptoms:
All requests fail with
-32050error/healthendpoint returns"status": "unhealthy"Metrics show
rpc_healthy_upstreams == 0
Causes & Solutions:
1. All Upstreams Failing Health Checks
Check health status:
curl http://localhost:3030/health | jq '.upstreams'Check metrics:
curl -s http://localhost:3030/metrics | grep rpc_upstream_health
# rpc_upstream_health{upstream="alchemy"} 0
# rpc_upstream_health{upstream="infura"} 0Solutions:
Verify upstream URLs are correct in configuration
Check network connectivity to upstream providers
Verify API keys are valid and not rate-limited
Check upstream provider status pages
2. Incorrect Configuration
Check configuration:
[[upstreams]]
name = "alchemy-mainnet"
url = "https://eth-mainnet.g.alchemy.com/v2/YOUR_API_KEY" # Verify this URL
chain_id = 1Common mistakes:
Missing or invalid API key in URL
Wrong chain ID (e.g., mainnet vs testnet mismatch)
HTTP instead of HTTPS
Incorrect endpoint path
3. Circuit Breakers All Open
Check circuit breaker state:
curl -s http://localhost:3030/metrics | grep rpc_circuit_breaker_state
# rpc_circuit_breaker_state{upstream="alchemy"} 1 # 1 = open (bad)Solution: Wait for circuit breakers to reset, or restart Prism to reset state:
# Circuit breakers will automatically transition to half-open after timeout
# Default timeout: 60 secondsReduce sensitivity:
[upstreams.circuit_breaker]
failure_threshold = 10 # Increase from default 5
timeout_seconds = 30 # Decrease recovery timeUpstream Connection Timeouts
Error: "Request timeout" or "Connection failed: connection timeout"
Symptoms:
Requests take 30+ seconds and then timeout
High P99 latency in metrics
Upstream error metrics show
error_type="timeout"
Check timeout metrics:
curl -s http://localhost:3030/metrics | grep timeout
# rpc_upstream_errors_total{upstream="alchemy",error_type="timeout"} 125Solutions:
1. Increase Timeout Values
[upstreams]
timeout_seconds = 60 # Increase from default 30
max_retries = 3 # Increase retry attempts2. Check Network Latency
# Test direct connection to upstream
time curl -s -o /dev/null -w "%{time_total}\n" \
https://eth-mainnet.g.alchemy.com/v2/YOUR_KEY \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'If latency > 5 seconds:
Network issue between your server and upstream provider
Try different upstream providers
Use providers geographically closer to your server
3. Reduce Concurrent Load
[upstreams]
max_concurrent_requests = 50 # Reduce from default 100HTTP 429 Rate Limiting
Error: "HTTP error: 429" or RPC error -32005: "Limit exceeded"
Symptoms:
Intermittent failures during high traffic
Error rate spikes in metrics
Upstream provider returns "Too Many Requests"
Check rate limit errors:
curl -s http://localhost:3030/metrics | grep rate_limit
# rpc_upstream_errors_total{upstream="alchemy",error_type="rpc_rate_limit"} 450
# rpc_jsonrpc_errors_total{upstream="alchemy",code="-32005"} 450Solutions:
1. Add More Upstreams
# Distribute load across multiple providers
[[upstreams]]
name = "alchemy-mainnet"
url = "https://eth-mainnet.g.alchemy.com/v2/KEY1"
[[upstreams]]
name = "infura-mainnet"
url = "https://mainnet.infura.io/v3/KEY2"
[[upstreams]]
name = "quicknode-mainnet"
url = "https://your-endpoint.quiknode.pro/KEY3/"2. Enable Caching to Reduce Upstream Calls
[cache]
enabled = true
[cache.block_cache]
hot_window_size = 500 # Cache more recent blocks
[cache.log_cache]
chunk_size = 100 # Larger chunks = better cache hit rate
max_chunks = 200Verify cache effectiveness:
# Check cache hit rate
curl -s http://localhost:3030/metrics | grep cache_hits3. Implement Rate Limiting at Prism Level
[rate_limiting]
enabled = true
requests_per_second = 50 # Limit client request rateCache Problems
Low Cache Hit Rate
Symptoms:
Cache hit rate < 70% consistently
Most requests show
X-Cache-Status: MISSHigh upstream request counts
Check cache hit rate:
curl -s http://localhost:3030/metrics | grep -E 'cache_(hits|misses)'
# Calculate: hits / (hits + misses)PromQL query:
rate(rpc_cache_hits_total[5m]) /
(rate(rpc_cache_hits_total[5m]) + rate(rpc_cache_misses_total[5m]))Solutions:
1. Increase Cache Sizes
[cache.block_cache]
hot_window_size = 1000 # Increase from default 200
max_headers = 50000 # Increase from default 10000
max_bodies = 20000 # Increase from default 5000
[cache.log_cache]
max_chunks = 500 # Increase from default 100
[cache.transaction_cache]
max_transactions = 100000 # Increase from default 50000
max_receipts = 1000002. Adjust Chunk Size for Log Cache
[cache.log_cache]
chunk_size = 100 # Smaller = more granular, better hit rate
# Larger = fewer chunks, less memory overheadTrade-off:
Smaller chunk_size (10-50): Better partial cache hits, more memory
Larger chunk_size (100-500): Less memory, may miss partial ranges
3. Enable Cache Warming
Warm cache with recent data on startup:
[cache]
warm_on_startup = true
warm_blocks_count = 1000 # Warm last 1000 blocks4. Check Request Patterns
Problem: Random historical queries bypass cache
Example: Querying random old blocks
# These all miss cache if not in hot window
eth_getBlockByNumber("0x500000", false)
eth_getBlockByNumber("0x600000", false)
eth_getBlockByNumber("0x750000", false)Solution:
Use recent blocks when possible (last 200 blocks have highest hit rate)
For historical queries, batch sequential ranges to benefit from cache
High Cache Eviction Rate
Symptoms:
rpc_cache_evictions_totalincreasing rapidlyCache hit rate declining over time
Memory pressure on system
Check eviction metrics:
curl -s http://localhost:3030/metrics | grep evictions
# rpc_cache_evictions_total{cache_type="block"} 15000
# rpc_cache_evictions_total{cache_type="log"} 32000Solutions:
1. Increase Cache Memory Limits
[cache.block_cache]
max_headers = 100000 # Allow more entries before eviction
max_bodies = 50000
[cache.log_cache]
max_chunks = 1000 # More chunks = less eviction2. Adjust Hot Window Size
[cache.block_cache]
hot_window_size = 500 # Keep more recent blocks in fast cache3. Check Memory Usage
# Monitor Prism process memory
ps aux | grep prism
# Check system memory
free -hIf memory constrained:
Reduce cache sizes
Add more RAM to server
Enable cache compression (if available)
Cache Invalidation After Reorgs
Symptoms:
Cache hit rate drops suddenly
Logs show "reorg detected" messages
Blocks being refetched repeatedly
Check reorg metrics:
curl -s http://localhost:3030/metrics | grep reorg
# rpc_reorgs_detected_total 8
# rpc_last_reorg_block 18499995Expected behavior: Cache invalidates blocks during reorgs to maintain consistency
Solutions:
1. Adjust Safety Depth
[reorg_manager]
safety_depth = 20 # Increase from default 12
# Blocks beyond this depth are "safe" from reorgsTrade-off:
Larger safety_depth: Less cache invalidation, but stale data during deep reorgs
Smaller safety_depth: More aggressive invalidation, always fresh data
2. Check for Reorg Storms
Frequent reorgs indicate:
Network issues with upstreams
Misconfigured chain_id (wrong network)
Upstream providers disagreeing on chain state
Verify chain consistency:
# Check all upstreams report same chain tip
curl http://localhost:3030/health | jq '.upstreams[] | {name, latest_block}'If upstreams disagree by > 10 blocks:
One or more upstreams may be syncing
Consider removing slow/stale upstreams
3. Enable Reorg Coalescing
[reorg_manager]
coalesce_window_ms = 100 # Batch rapid reorgs within 100ms windowAuthentication Errors
Invalid or Missing API Key
Error: -32054: "Authentication failed" or -32600: "Invalid Request"
Symptoms:
All requests rejected with authentication error
No
X-API-Keyheader in requests
Solutions:
1. Include API Key in Request
Header method:
curl -X POST http://localhost:3030/ \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key-here" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'Query parameter method:
curl -X POST "http://localhost:3030/?api_key=your-api-key-here" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'2. Verify API Key Configuration
[authentication]
enabled = true
[[authentication.keys]]
key = "prod-key-12345"
name = "production-api"
allowed_methods = ["*"] # Or specific methods
rate_limit_per_second = 1003. Check Authentication Metrics
curl -s http://localhost:3030/metrics | grep auth
# rpc_auth_success_total{key_id="production-api"} 125000
# rpc_auth_failure_total{key_id="unknown"} 45Method Not Allowed for API Key
Error: -32055: "Method not allowed"
Symptoms:
Some methods work, others return permission error
Authentication succeeds but specific RPC calls fail
Check key permissions:
[[authentication.keys]]
key = "logs-only-key"
allowed_methods = [
"eth_getLogs",
"eth_blockNumber"
]
# Calling eth_getBlockByNumber with this key will failSolution: Update key permissions or use appropriate key:
[[authentication.keys]]
key = "full-access-key"
allowed_methods = ["*"] # Allow all methodsRate Limit Exceeded
Error: -32053: "Rate limit exceeded"
Symptoms:
Requests succeed initially, then fail during high traffic
Rate limit metrics show rejections
Check rate limit metrics:
curl -s http://localhost:3030/metrics | grep rate_limit
# rpc_rate_limit_rejected_total{key="production-api"} 500Solutions:
1. Increase Rate Limit
[[authentication.keys]]
key = "production-api"
rate_limit_per_second = 200 # Increase from 1002. Implement Client-Side Rate Limiting
// Use rate limiting library on client side
const Bottleneck = require('bottleneck');
const limiter = new Bottleneck({
minTime: 20, // 50 requests per second
maxConcurrent: 10
});
limiter.schedule(() => makeRpcRequest());3. Use Multiple API Keys
Distribute load across multiple API keys:
const keys = ['key1', 'key2', 'key3'];
const keyIndex = requestCount % keys.length;
const apiKey = keys[keyIndex];Performance Problems
High Request Latency (P99 > 1s)
Symptoms:
Slow response times for clients
High P99 latency in metrics
Timeouts during peak traffic
Check latency metrics:
curl -s http://localhost:3030/metrics | grep duration_secondsPromQL query:
histogram_quantile(0.99, rate(rpc_request_duration_seconds_bucket[5m]))Solutions:
1. Enable Caching
[cache]
enabled = trueVerify cache is working:
# Check cache status header
curl -D - http://localhost:3030/ \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' | grep -i cache
# X-Cache-Status: FULL2. Add Faster Upstreams
[[upstreams]]
name = "quicknode-mainnet"
url = "https://your-endpoint.quiknode.pro/KEY/" # Often lower latency
priority = 10 # Higher priority for faster upstream3. Enable Request Hedging
[routing]
strategy = "hedge"
[routing.hedge]
enabled = true
hedge_delay_ms = 100 # Send backup request after 100msHow hedging helps:
Sends request to primary upstream
If no response in 100ms, sends to backup upstream
Returns whichever responds first
Reduces tail latency
4. Optimize Connection Pooling
[upstreams]
max_concurrent_requests = 200 # Allow more concurrent requests
timeout_seconds = 30 # Reduce from 60 if upstreams are fastLow Throughput (RPS < Expected)
Symptoms:
Cannot achieve desired requests per second
Concurrent requests queue up
CPU or network not saturated
Check throughput metrics:
curl -s http://localhost:3030/metrics | grep rpc_requests_totalPromQL query:
rate(rpc_requests_total[1m]) # Requests per secondSolutions:
1. Increase Concurrency Limits
[upstreams]
max_concurrent_requests = 500 # Increase from default 100
[server]
max_connections = 10000 # Increase connection pool2. Add More Upstreams
# Distribute load across more providers
[[upstreams]]
name = "alchemy"
# ... config
[[upstreams]]
name = "infura"
# ... config
[[upstreams]]
name = "quicknode"
# ... config3. Enable Load Balancing
[routing]
strategy = "least_loaded" # Distribute evenly across upstreams4. Use Batch Requests
Instead of:
// 100 separate HTTP requests
for (let i = 0; i < 100; i++) {
await fetch('/rpc', { method: 'POST', body: singleRequest });
}Do this:
// 1 batched HTTP request
const batch = requests.map(req => ({ ...req, id: i }));
await fetch('/rpc', { method: 'POST', body: JSON.stringify(batch) });Circuit Breaker Issues
Circuit Breaker Stuck Open
Symptoms:
Upstream marked healthy but circuit breaker stays open
Requests continue to fail with "Circuit breaker is open"
Circuit breaker state metric shows
1(open) for extended period
Check circuit breaker state:
curl -s http://localhost:3030/metrics | grep circuit_breaker_state
# rpc_circuit_breaker_state{upstream="alchemy"} 1 # 1 = openCheck transition history:
curl -s http://localhost:3030/metrics | grep circuit_breaker_transitions
# rpc_circuit_breaker_transitions_total{upstream="alchemy",to_state="open"} 3
# rpc_circuit_breaker_transitions_total{upstream="alchemy",to_state="half_open"} 0Solutions:
1. Wait for Automatic Recovery
Circuit breaker will transition to half-open after timeout:
[upstreams.circuit_breaker]
timeout_seconds = 60 # Default timeoutCheck when circuit will recover:
# Look for last failure time in logs
# Circuit opens at: T
# Will try recovery at: T + 60 seconds2. Reduce Circuit Breaker Sensitivity
[upstreams.circuit_breaker]
failure_threshold = 10 # Increase from default 5
timeout_seconds = 30 # Decrease recovery time from 603. Restart Prism (Last Resort)
# Restarting resets all circuit breaker state
systemctl restart prism
# or
docker restart prism-containerCircuit Breaker Opens Too Easily
Symptoms:
Circuit breaker opens during normal operation
Temporary errors trigger circuit breaker
Frequent open/close cycles
Check failure threshold:
curl -s http://localhost:3030/metrics | grep circuit_breaker_failure_count
# rpc_circuit_breaker_failure_count{upstream="alchemy"} 5 # At thresholdSolutions:
1. Increase Failure Threshold
[upstreams.circuit_breaker]
failure_threshold = 10 # Increase from default 5
# Requires 10 consecutive failures before opening2. Add Retry Logic
[upstreams]
max_retries = 3 # Retry failed requests
retry_delay_ms = 100 # Wait 100ms between retries3. Adjust Error Classification
Review logs to identify error types:
journalctl -u prism | grep "circuit breaker"Error types that trigger circuit breaker:
Provider errors (
-32603: Internal error)Parse errors (malformed responses)
Timeouts (upstream unresponsive)
Error types that DON'T trigger circuit breaker:
Client errors (
-32600: Invalid Request)Rate limits (
-32005: Limit exceeded)Execution errors (transaction reverts)
Reorg Handling Issues
Cache Contains Stale Data After Reorg
Symptoms:
Queries return different data than upstream
Block hashes don't match expected values
Transactions show incorrect status
Check reorg detection:
curl -s http://localhost:3030/metrics | grep reorg
# rpc_reorgs_detected_total 5
# rpc_last_reorg_block 18499995Verify cache invalidation:
# Check logs for invalidation messages
journalctl -u prism | grep "invalidating block"Solutions:
1. Verify Reorg Detection Is Working
Check health endpoint:
curl http://localhost:3030/health | jq '.upstreams[] | {name, latest_block, finalized_block}'If all upstreams show same tip: Reorg detection is working
If upstreams disagree: May indicate reorg in progress
2. Reduce Safety Depth
[reorg_manager]
safety_depth = 6 # Reduce from default 12
# More aggressive cache invalidation3. Clear Cache Manually
Restart Prism to clear all caches:
systemctl restart prismOr use cache clear endpoint (if implemented):
curl -X POST http://localhost:3030/admin/clear-cache \
-H "X-Admin-Key: your-admin-key"Too Many Reorg Detections
Symptoms:
rpc_reorgs_detected_totalincreasing rapidlyFrequent cache invalidation
Low cache hit rate due to constant invalidation
Check reorg frequency:
curl -s http://localhost:3030/metrics | grep reorgs_detected
# rpc_reorgs_detected_total 150 # Very highPromQL query:
rate(rpc_reorgs_detected_total[1h]) # Reorgs per secondCauses & Solutions:
1. Upstreams Reporting Different Chain States
Symptom: Upstreams disagree on current tip or block hashes
Check upstream consistency:
# Query all upstreams directly
for upstream in alchemy infura quicknode; do
echo "=== $upstream ==="
curl -s https://$upstream.url/... \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
| jq .result
doneIf blocks differ by > 5: One or more upstreams may be syncing or stale
Solution: Remove stale upstreams from configuration
2. WebSocket Reconnections Causing False Reorgs
Check WebSocket metrics:
curl -s http://localhost:3030/metrics | grep websocket
# rpc_websocket_disconnections_total 45Solution: Improve WebSocket stability
[upstreams.websocket]
enabled = true
reconnect_delay_ms = 5000 # Wait longer before reconnect
max_reconnect_attempts = 103. Network Issues Between Prism and Upstreams
Symptom: Intermittent connectivity causes missed block notifications
Solution:
Move Prism closer to upstream providers (same region)
Use more reliable network connection
Enable WebSocket fallback to HTTP polling
WebSocket Connection Failures
WebSocket Disconnects Repeatedly
Symptoms:
Frequent "WebSocket disconnected" in logs
High reconnection count in metrics
Missing block notifications
Check WebSocket status:
curl -s http://localhost:3030/metrics | grep websocket
# rpc_websocket_active_connections 0 # Should be > 0
# rpc_websocket_disconnections_total 125Solutions:
1. Increase Reconnection Delay
[upstreams.websocket]
reconnect_delay_ms = 5000 # Increase from default 1000
max_reconnect_attempts = 202. Check Upstream WebSocket Support
Test WebSocket connection manually:
# Install wscat: npm install -g wscat
wscat -c wss://eth-mainnet.g.alchemy.com/v2/YOUR_KEY
# Subscribe to new heads
> {"jsonrpc":"2.0","method":"eth_subscribe","params":["newHeads"],"id":1}
# Should receive: {"jsonrpc":"2.0","result":"0x...","id":1}
# Then block notifications every ~12 secondsIf connection fails:
Upstream may not support WebSocket
API key may not have WebSocket access
Firewall blocking WebSocket connections
3. Disable WebSocket (Fallback to HTTP)
[upstreams.websocket]
enabled = false # Disable WebSocket, use HTTP polling onlyNote: HTTP polling is less efficient but more reliable
Missing Block Notifications
Symptoms:
Chain tip not updating in Prism
/healthshows stalelatest_blockReorg detection not working
Check chain tip updates:
# Monitor health endpoint
watch -n 5 'curl -s http://localhost:3030/health | jq .upstreams[0].latest_block'Solutions:
1. Verify WebSocket Subscription
Check logs for subscription confirmation:
journalctl -u prism | grep "subscribed to newHeads"If no subscription message: WebSocket not connected
2. Enable HTTP Polling Fallback
[upstreams.health_check]
enabled = true
interval_seconds = 30 # Poll every 30 seconds
timeout_seconds = 10How it helps:
Health checker polls
eth_blockNumberperiodicallyUpdates chain tip even if WebSocket fails
Detects rollbacks and reorgs
3. Check Firewall Rules
WebSocket requires outbound connections:
# Allow outbound HTTPS/WSS (port 443)
sudo ufw allow out 443/tcp
# Test connectivity
curl -v https://eth-mainnet.g.alchemy.com/v2/YOUR_KEY
# Should see "Connected to eth-mainnet.g.alchemy.com"Error Codes Reference
JSON-RPC Standard Errors
-32700
Parse error
Invalid JSON received
Check request syntax; ensure valid JSON
-32600
Invalid Request
Request object malformed
Verify jsonrpc: "2.0", method, id fields
-32601
Method not found
Method doesn't exist or unsupported
Check method name spelling; see supported methods
-32602
Invalid params
Invalid method parameters
Verify parameter types and count
-32603
Internal error
Internal server error
Check Prism logs; may be upstream issue
Ethereum JSON-RPC Errors
-32000
Server error
Generic server error
Check error message for details
-32001
Resource not found
Requested resource doesn't exist
Block/transaction may not exist yet
-32002
Resource unavailable
Resource temporarily unavailable
Retry after delay; upstream may be syncing
-32003
Transaction rejected
Transaction wouldn't be accepted
Check transaction parameters
-32004
Method not supported
Method not implemented
Method not supported by upstream
-32005
Limit exceeded
Request exceeds defined limit
Reduce query range; add more upstreams
Prism-Specific Errors
-32050
No healthy upstreams
All upstreams unavailable
Check upstream configuration; verify API keys
-32051
Circuit breaker open
Upstream circuit breaker open
Wait for recovery or restart; check upstream health
-32052
Consensus failure
Upstreams disagree on response
Check upstream consistency; remove stale upstreams
-32053
Rate limit exceeded
Request rate limit exceeded
Implement client rate limiting; increase limits
-32054
Authentication failed
Invalid or missing API key
Include X-API-Key header; verify key is valid
-32055
Method not allowed
API key lacks method permission
Update key permissions or use different key
Upstream Provider Errors
Execution Errors (Client's fault, NOT penalized):
"execution reverted"- Smart contract reverted"out of gas"- Transaction ran out of gas"insufficient funds"- Account balance too low"nonce too low"- Transaction nonce already used"gas too low"- Gas limit too low for transaction
Provider Errors (Upstream's fault, triggers circuit breaker):
"Internal error"(-32603) - Upstream server error"server error"(-32000) - Generic upstream error
Rate Limit Errors (Transient, retry on different upstream):
"Limit exceeded"(-32005) - Upstream rate limitHTTP 429 - Too many requests
Debug Strategies
General Debugging Workflow
Check Health Endpoint
curl http://localhost:3030/health | jq .Verify
status: "healthy"Check upstream health and response times
Verify cache statistics
Check Metrics
curl http://localhost:3030/metrics | grep -E '(error|unhealthy|circuit|reorg)'Look for error rate spikes
Check circuit breaker states
Review reorg activity
Enable Debug Logging
[logging] level = "debug" # or "trace" for extreme verbosity format = "pretty"Monitor Real-Time Logs
# Systemd journalctl -u prism -f # Docker docker logs -f prism-container # Direct binary tail -f /var/log/prism/prism.log
Debugging Specific Issues
Debug Cache Misses
1. Enable verbose logging:
[logging]
level = "debug"2. Look for cache decision logs:
journalctl -u prism | grep "cache"
# DEBUG cache hit method=eth_getBlockByNumber block=18500000
# DEBUG cache miss method=eth_getLogs range=18400000-185000003. Check cache status headers:
curl -D - http://localhost:3030/ \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_getBlockByNumber","params":["0x11a6e3b",false],"id":1}' \
| grep -i cachePossible values:
FULL: Complete cache hitPARTIAL: Some data from cache, rest from upstreamMISS: No cached data, all from upstreamEMPTY: Cached empty result (e.g., no logs in range)
Debug Upstream Selection
1. Enable routing logs:
journalctl -u prism | grep "upstream selected"2. Check upstream scoring:
curl -s http://localhost:3030/metrics | grep composite_score
# rpc_upstream_composite_score{upstream="alchemy"} 0.875
# rpc_upstream_composite_score{upstream="infura"} 0.7203. Review selection reasons:
curl -s http://localhost:3030/metrics | grep upstream_selections
# rpc_upstream_selections_total{upstream="alchemy",reason="best_score"} 85000
# rpc_upstream_selections_total{upstream="infura",reason="fallback"} 15000Debug Authentication Issues
1. Check authentication metrics:
curl -s http://localhost:3030/metrics | grep auth
# rpc_auth_success_total{key_id="production-api"} 125000
# rpc_auth_failure_total{key_id="unknown"} 452. Test with known valid key:
curl -X POST http://localhost:3030/ \
-H "Content-Type: application/json" \
-H "X-API-Key: test-key-12345" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'3. Check key permissions:
# Review configuration
cat /etc/prism/config.toml | grep -A 10 "authentication.keys"Debug Performance Issues
1. Profile request latency:
# Single request timing
time curl -X POST http://localhost:3030/ \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'2. Check P50/P95/P99 latency:
histogram_quantile(0.50, rate(rpc_request_duration_seconds_bucket[5m]))
histogram_quantile(0.95, rate(rpc_request_duration_seconds_bucket[5m]))
histogram_quantile(0.99, rate(rpc_request_duration_seconds_bucket[5m]))3. Identify slow methods:
curl -s http://localhost:3030/metrics | grep duration_seconds_sum
# Calculate average: sum / count for each method4. Check upstream latency:
curl -s http://localhost:3030/metrics | grep upstream_latency
# rpc_upstream_latency_p99_ms{upstream="alchemy"} 350
# rpc_upstream_latency_p99_ms{upstream="infura"} 420FAQ
General Questions
Q: What methods does Prism cache?
A: Prism caches the following methods:
eth_getBlockByHash- Block cacheeth_getBlockByNumber- Block cacheeth_getLogs- Log cache (with partial-range support)eth_getTransactionByHash- Transaction cacheeth_getTransactionReceipt- Transaction cache
Not cached (forwarded to upstream):
eth_blockNumber- Always latest valueeth_chainId- Static valueeth_gasPrice- Changes frequentlyeth_getBalance- Account-specific, changes with every transactioneth_call- Depends on state, not safely cacheable
Q: How long does cached data stay valid?
A: Cache validity depends on block finality:
Finalized blocks (past finalized checkpoint): Cached forever
Safe blocks (beyond safety_depth from tip): Cached until reorg detected
Unsafe blocks (within safety_depth of tip): May be invalidated during reorgs
Default safety_depth: 12 blocks (~2.4 minutes)
Configuration:
[reorg_manager]
safety_depth = 12 # Blocks beyond this are "safe"Q: Can I disable caching for specific methods?
A: Currently, caching is method-specific and cannot be selectively disabled. However, you can:
Disable all caching:
[cache]
enabled = falseReduce cache sizes (effectively disables caching):
[cache.block_cache]
max_headers = 0
max_bodies = 0
[cache.log_cache]
max_chunks = 0Use separate Prism instances (cached vs non-cached)
Q: How many upstreams should I configure?
A: Recommended:
Minimum: 2 upstreams for redundancy
Optimal: 3-4 upstreams for best reliability and performance
Maximum: No hard limit, but 5+ may have diminishing returns
Why multiple upstreams:
Redundancy: If one fails, others handle requests
Load distribution: Spread requests across providers
Rate limit mitigation: Switch to different upstream when one is rate-limited
Latency optimization: Route requests to fastest upstream
Q: What happens during a blockchain reorg?
A: Prism handles reorgs automatically:
Detection: WebSocket or health check detects chain tip changed
Invalidation: Affected blocks removed from cache
Refetch: Next request fetches fresh data from upstream
Consistency: Clients always receive canonical chain data
Example:
Before: Block 1000 (hash 0xAAA) cached
Reorg: Block 1000 now has hash 0xBBB
Action: Block 1000 invalidated from cache
Result: Next request fetches new block 1000 (0xBBB) from upstreamConfiguration Questions
Q: What's the difference between safety_depth and max_reorg_depth?
safety_depth and max_reorg_depth?A:
safety_depth: Blocks beyond this from tip are considered "safe" from reorgsDefault: 12 blocks
Used for: Cache retention decisions, finality classification
Example: With safety_depth=12 at tip 1000, blocks <= 988 are "safe"
max_reorg_depth: Maximum blocks to search backwards for reorg divergenceDefault: 100 blocks
Used for: Limit reorg detection scope, prevent excessive cache invalidation
Example: Reorg affecting 150 blocks will only invalidate last 100
Configuration:
[reorg_manager]
safety_depth = 12 # Finality threshold
max_reorg_depth = 100 # Reorg search limitQ: How do I optimize for lowest latency?
A: Latency optimization checklist:
Enable caching:
[cache]
enabled = true
[cache.block_cache]
hot_window_size = 500 # Cache recent blocksUse fast upstreams:
[[upstreams]]
name = "quicknode"
url = "https://your-endpoint.quiknode.pro/KEY/"
priority = 10 # Higher priorityEnable hedging:
[routing]
strategy = "hedge"
[routing.hedge]
hedge_delay_ms = 50 # Send backup request after 50msGeographic proximity: Deploy Prism in same region as upstreams
Connection pooling:
[upstreams]
max_concurrent_requests = 200Monitor latency:
histogram_quantile(0.99, rate(rpc_request_duration_seconds_bucket[5m]))Q: How do I optimize for highest throughput?
A: Throughput optimization checklist:
Add more upstreams:
# 4-5 upstreams for load distribution
[[upstreams]]
name = "alchemy"
# ...
[[upstreams]]
name = "infura"
# ...Increase concurrency:
[upstreams]
max_concurrent_requests = 500 # Per upstream
[server]
max_connections = 10000Use load balancing:
[routing]
strategy = "least_loaded"Enable batch requests: Use JSON-RPC batch format
Maximize caching:
[cache.block_cache]
hot_window_size = 1000
max_headers = 100000Troubleshooting Questions
Q: Why is my cache hit rate so low?
A: Common causes:
Random historical queries: Queries to random old blocks bypass hot window cache
Solution: Focus queries on recent blocks (last 200 blocks)
Cache size too small: Frequent evictions reduce hit rate
Solution: Increase cache sizes in configuration
Wide log query ranges: Large block ranges cause partial misses
Solution: Use smaller ranges or increase
chunk_size
Reorgs invalidating cache: Frequent reorgs clear cached blocks
Solution: Increase
safety_depth, check for reorg storms
Cache disabled: Check
cache.enabled = true
Debug:
# Check cache hit rate
curl -s http://localhost:3030/metrics | grep -E 'cache_(hits|misses)_total'
# PromQL
rate(rpc_cache_hits_total[5m]) /
(rate(rpc_cache_hits_total[5m]) + rate(rpc_cache_misses_total[5m]))Q: How do I know if an upstream is slow or failing?
A: Check these metrics:
Health status:
curl http://localhost:3030/health | jq '.upstreams[] | select(.name=="alchemy")'Latency metrics:
curl -s http://localhost:3030/metrics | grep 'upstream_latency.*alchemy'
# rpc_upstream_latency_p99_ms{upstream="alchemy"} 850 # High latencyError counts:
curl -s http://localhost:3030/metrics | grep 'upstream_errors.*alchemy'
# rpc_upstream_errors_total{upstream="alchemy",error_type="timeout"} 125Circuit breaker state:
curl -s http://localhost:3030/metrics | grep 'circuit_breaker_state.*alchemy'
# rpc_circuit_breaker_state{upstream="alchemy"} 1 # Open = failingWarning signs:
P99 latency > 1 second
Error rate > 5%
Circuit breaker open
Block lag > 10 blocks behind tip
Q: Why do I see "Circuit breaker is open" errors?
A: Circuit breaker opens when upstream has too many consecutive failures.
Immediate fix: Wait 60 seconds for automatic recovery to half-open state
Long-term solutions:
Fix upstream issues:
Check API key validity
Verify network connectivity
Check provider status page
Adjust circuit breaker settings:
[upstreams.circuit_breaker]
failure_threshold = 10 # Increase from 5
timeout_seconds = 30 # Decrease from 60Add redundant upstreams: More upstreams reduce impact of one failing
Monitor recovery:
# Watch for half-open transition
journalctl -u prism -f | grep "circuit breaker"Q: How can I reduce API costs?
A: Cost reduction strategies:
Maximize caching:
[cache]
enabled = true
[cache.block_cache]
hot_window_size = 1000
max_headers = 100000
[cache.log_cache]
max_chunks = 500Use tiered upstreams:
# Free tier for cacheable queries
[[upstreams]]
name = "public-rpc"
url = "https://eth.public-rpc.com"
priority = 1 # Low priority
# Paid tier for non-cacheable queries
[[upstreams]]
name = "alchemy-paid"
url = "https://eth-mainnet.g.alchemy.com/v2/KEY"
priority = 10 # High priorityImplement client-side batching: Batch requests to reduce HTTP overhead
Rate limit clients: Prevent abuse
[authentication.keys]
rate_limit_per_second = 50Monitor request distribution:
# Check which methods use most requests
curl -s http://localhost:3030/metrics | grep 'rpc_requests_total' | sort -t= -k2 -nStill having issues? Join our community:
GitHub Issues: https://github.com/your-org/prism/issues
Discord: https://discord.gg/prism
Documentation: https://docs.prism.sh
Next: Explore Monitoring & Observability for proactive issue detection.
Last updated