Architecture · 10 min read
Multi-chain indexing explained
A multi-chain indexer ingests onchain data from many blockchains into a single queryable database. This guide covers what changes when you go from one chain to many: the three architectural patterns teams use, the validation problem that none of them get for free, and how to harmonise heterogeneous chain data into a useful schema.
1. What is multi-chain indexing?
Multi-chain indexing is the practice of ingesting data from more than one blockchain into a single database, with a schema that lets the same query answer questions across chains. The per-chain work an indexer does (extract, decode, store, serve) happens once per chain. What changes is everything above that: how the schema represents data from chains with different models, how finality and reorgs are tracked when each chain has its own rules, and how queries that span chains stay performant.
Multi-chain is not the same as cross-chain. Cross-chain refers to value or messages moving between chains via bridges, message-passing protocols, or shared sequencers. Multi-chain refers to the data architecture that lets one product see all chains at once. A multi-chain indexer is what you build cross-chain analytics on top of; the two concepts often co-occur but solve different problems.
2. Why teams need it
The pressure to go multi-chain comes from the product side, not the data side. A wallet that supports five chains needs balance history for all five. A DEX aggregator that routes across L2s needs swap data from every chain it routes to. An institutional analytics platform tracking stablecoin flows needs every chain those stablecoins live on. None of these products can answer their core question with a single-chain indexer.
The naive approach scales poorly: one indexer process per chain, one database per chain, one API per chain. The application has to fan out queries, merge results, and reconcile differences in how each chain represents the same logical action. Each new chain adds operational surface (archive nodes, monitoring, on-call) and data-shape work (per-chain decoders, per-chain schemas). At three chains it is annoying. At thirty it stops working.
Multi-chain indexing centralises the per-chain work behind a single interface. The application asks one question and gets one answer, whether the answer involves one chain or twenty. The complexity moves into the indexer, where it is paid once rather than per product.
3. Three architectural patterns
Most multi-chain indexers fall into one of three architectures. The choice has consequences for cost, latency, and how easy it is to add a new chain.
Pattern 1: Federated single-chain indexers. One indexer process per chain, each writing to its own database. A query layer in front of the databases routes by chain or merges results. This is the easiest to start with because each chain is an isolated build, but it does not produce a unified schema; cross-chain queries are aggregated at the API layer, often slowly. Most early-stage analytics products start here.
Pattern 2: Shared schema, per-chain pipelines. Each chain has its own ingestion pipeline, but all pipelines write into a shared database with a chain-agnostic schema (e.g. a single transfers table with a chain_id column). The decode step normalizes per-chain primitives into the shared shape. Cross-chain queries become single-database queries. The cost is that adding a new chain requires writing a new normalizer that maps that chain's primitives into the shared schema.
Pattern 3: Shared data source, application-layer indexers. A single data source (a network, a data lake, or a vendor API) serves decoded chain data for many chains through one interface. Applications run their own indexer logic against that data source rather than running per-chain ingestion themselves. Adding a chain is a configuration change on the data source; the application picks it up by querying differently. This pattern is what hosted services and decentralized data networks (like SQD Network) provide.
A team that owns its data infrastructure end-to-end usually ends up at Pattern 2 (more control, more ops). A team that wants to focus on application logic usually ends up at Pattern 3 (less control, less ops). Pattern 1 is a temporary state on the way to one of the others.
4. The validation problem
A single-chain indexer can trust its source RPC or archive node and inherit that source's view of finality. A multi-chain indexer cannot, because each chain has different rules and the indexer has to be correct under all of them simultaneously.
Specifically:
- Ethereum mainnet finalizes blocks roughly every 12.8 minutes (two epochs). Polygon PoS uses Heimdall milestones for deterministic finality on the order of seconds. Solana finalizes after about 12.8 seconds (~32 slots). Bitcoin's practical finality is probabilistic, conventionally six blocks (~60 minutes).
- Reorg depth varies by chain. Ethereum reorgs are typically 1-2 blocks. Some L2s can reorg deeper if the underlying sequencer or batch poster has issues. Bitcoin reorgs of more than three blocks are rare but observable.
- Some chains have probabilistic finality, some have deterministic finality, and some have a separate "safe" head distinct from "finalized".
A naive multi-chain indexer that treats all chains the same will ingest data that later reorgs out on one chain while reporting it as final, or hold back data that has finalised on another chain because its watermark is too conservative. The fix is per-chain finality tracking: each chain has its own watermark for "last finalised block", reorgs are detected and rolled back per chain, and the served schema exposes finality state so consumers know which rows are settled.
The deeper validation problem is correctness against the chain itself. Reading from a single RPC means trusting that one node. Multi-source consensus (compare the same query against multiple RPCs or full nodes and only accept agreed results) reduces this risk but adds latency and infrastructure cost. Decentralized data networks attempt to solve this at the protocol layer with cryptographic validation across worker nodes.
5. Schema harmonization across chains
Different chains have different data models. An ERC-20 token transfer on Ethereum is an event log on a contract address. The "same" transfer on Solana is an instruction in a transaction targeting the SPL Token program with an associated token account. The "same" transfer on Bitcoin is a UTXO consumption with an output to a new address. The underlying action is similar; the representation is not.
Two harmonization strategies are common.
Lowest-common-denominator schemas. Pick a shared shape (e.g. transfer(chain_id, block_time, from, to, token, amount)) that covers what most chains can express, and lose the chain-specific details that don't fit. Queries are simple; chain-specific power is unavailable.
Layered schemas. Expose the harmonized shape for queries that need cross-chain joins, AND expose the raw per-chain primitives for queries that need chain-specific detail. The consumer chooses which layer to use. This costs more storage (raw + harmonized) but preserves the option for power users.
The right choice depends on the consumers. A retail wallet wants the lowest-common-denominator transfer view because it never needs to know about Solana's associated token accounts. A compliance product needs the layered view because tracing a transaction back through Solana program-derived addresses requires the raw primitives. Most production multi-chain indexers end up offering both.
6. Operational considerations
The ops surface of a multi-chain indexer scales non-linearly with chain count. A few specific things tend to bite teams operating their own multi-chain ingest:
Archive node operation. Each chain has its own archive node software (Geth, Erigon, Reth, Nethermind for EVM; Solana validators; substrate-node for Polkadot; Bitcoin Core). Each has its own minimum hardware spec, sync time, and operational quirks. Running ten archive nodes is ten different on-call playbooks.
Chain upgrades and forks. When a chain hard-forks, the indexer's decoder may need updating to handle new opcodes, new transaction types, or new precompiles. Across many chains, hard forks land regularly. A pipeline that breaks silently on a fork is a frequent failure mode.
Per-chain backfill speed. Backfilling a new chain from genesis takes time proportional to the chain's history. Ethereum mainnet from genesis is years of state; an L2 from genesis might be months. Teams that add new chains often need a strategy for "fast catch-up" using snapshots or parallel range ingestion.
Cost monitoring across chains. Indexing some chains is much more expensive than others. Solana produces orders of magnitude more transactions per second than Bitcoin. Without per-chain cost attribution, teams discover budget overruns after the bill arrives.
Buying multi-chain coverage from a vendor (Pattern 3 above) externalises all of this to the vendor. The trade-off is that the vendor decides which chains are available and how fast they get added. Self-hosting (Pattern 2) keeps that control but adds the operational load.
7. The 2026 multi-chain tooling landscape
Tools positioned for multi-chain workloads in 2026 sit in a few camps. For per-tool head-to-heads against SQD, see the comparison pages.
- Apps & productsWallets Tax Payments KYC RWA
- IntelligenceComparison coming soon Comparison coming soon Comparison coming soon Comparison coming soon Comparison coming soon
- Protocol analyticsComparison coming soon Comparison coming soon
- Indexed data
- Our focus Read-side infrastructureSQD decentralized, validated, multi-chain at source
- Node providers
Decentralized data networks. SQD Network serves range queries across the chains listed at sqd.dev/chains; applications run their own Squid SDK or Pipes SDK indexer logic against it. The Graph hosts subgraphs across its supported networks; each subgraph targets one chain but the marketplace covers many.
Hosted services. Allium delivers decoded multi-chain data into customer warehouses (Snowflake, BigQuery, Databricks). Goldsky hosts subgraphs plus pipelines into warehouses. Bitquery exposes a hosted GraphQL API covering many chains with pre-built schemas for DEX and token activity.
Self-hosted frameworks with multi-chain support. SubQuery is a code-first framework with multi-chain mapping support across EVM, Substrate, Cosmos, and others. Squid SDK and Pipes SDK support the chains listed at sqd.dev/chains when run against the SQD Network as a data source.
The selection criteria (chain coverage, data shape, latency, hosting model, pricing, lock-in) from the indexer evaluation framework apply, with chain coverage typically becoming the dominant axis once "multi-chain" is in the requirements.
Frequently asked questions
What is multi-chain indexing?
What is the difference between multi-chain and cross-chain?
How many chains can a single indexer support?
What is the validation problem in multi-chain indexing?
How do indexers handle different chain data models?
Which tools support multi-chain indexing in 2026?
Related guides
Try SQD across many chains in one query
Portal, the open-source Squid and Pipes SDKs, and the chains listed at sqd.dev/chains.