Concepts · 12 min read
What is a blockchain indexer? A complete guide
Blockchain indexers turn raw chain state into queryable database tables. They sit between an archive node and the application that needs the data. This guide covers what indexers do, what RPC endpoints don't do, the types of indexer in use today, how to evaluate them, and how teams decide whether to build or buy.
1. What is a blockchain indexer?
A blockchain indexer is a service that reads raw onchain data (blocks, transactions, event logs, traces) and transforms it into structured, queryable database tables. It sits between the chain's data source (an archive node or a data lake) and the application that needs the data.
An indexer's job has three parts. First, it ingests history: every block and every event since the chain or contract started. Second, it decodes that history using the contract ABIs, turning the raw topics and data fields of a log into typed events like Transfer(address from, address to, uint256 value). Third, it stores the decoded events in a database designed for analytical queries, typically a columnar store such as ClickHouse, an OLAP database, or a relational database with the right indexes.
The applications that depend on indexed data span the onchain ecosystem. DeFi protocols use it to compute user positions, fees earned, and historical TVL. Wallets use it to render full address history without making thousands of RPC calls per page load. Analytics platforms like Dune are built on top of decoded chain data. Compliance tools use traces and state diffs to support transaction monitoring and Travel Rule submissions. Onchain games and NFT marketplaces use it for ownership and metadata queries.
What an indexer is not: it is not a node, and it does not validate consensus. An indexer reads from a node (or from a network that reads from nodes) and trusts that source to be correct. Validation of the data itself, checking that the stored events match what is actually onchain, is a separate concern that some indexers implement (through cryptographic verification or multi-source consensus) and others do not.
2. What an RPC endpoint doesn't give you
RPC endpoints were designed to serve current chain state and recent history to applications and other nodes. They were not designed to answer questions about long histories or aggregate behavior. Four limitations tend to push teams from RPC-only to indexer-backed access.
State pruning. Default node configurations prune historical state to save disk. Querying anything older than the prune window through methods like eth_getBalance or eth_call against a past block returns an error. Running a Geth or Erigon node in archive mode keeps the full state but requires several terabytes of SSD per chain, and significantly more if the node is also tracing.
Range limits on eth_getLogs. Providers cap the block range and result count per call (commonly in the thousands; the exact cap varies by provider and tier). Backfilling a contract's history across the chain's life means iterating through many calls, each with its own rate limit. The wall-clock time to ingest several years of events this way is measured in hours to days, not minutes.
Decoding on the client. RPC returns event logs as raw topics arrays and a hex data blob. Every application has to apply each contract's ABI to decode that into typed fields, and every application has to keep its ABIs in sync with contract upgrades.
Multi-chain coordination. If the application needs to answer questions across multiple chains, the team either runs one RPC client per chain (with separate authentication, rate limits, and failover) or pays a provider that does so. Either way, the application has to merge results from heterogeneous data sources.
Indexers exist to solve all four. They run the archive nodes (or read from a data lake), do the decode once, store the result in a single database, and expose it through one query interface that does not change shape when a new chain is added.
3. How a blockchain indexer works
An indexer is a pipeline. The exact technologies vary, but the stages do not.
Extract. The indexer pulls blocks, transactions, logs, and (optionally) traces from a chain data source. The source is either an archive node the indexer runs itself, a node provider's API, or a data lake that aggregates many chains. The extraction is parallel across block ranges to keep up with the chain's tip and to backfill history quickly.
Decode. Raw event logs are emitted as a topics array (the first entry is the event signature hash) plus a data blob (the non-indexed parameters, ABI-encoded). The decode step applies the contract's ABI to turn that into a typed object: for example, Transfer { from: Address, to: Address, value: U256 }. Indexers usually maintain a registry of ABIs keyed by contract address; some support detecting proxy contracts and re-decoding when the implementation address changes.
Transform. Most applications need more than raw decoded events. The transform stage can aggregate (count of transfers per day), denormalize (join token metadata onto each transfer), or enrich (resolve internal calls from traces). Indexer frameworks differ most here: some give the developer a code-first API (Squid SDK, Pipes SDK, Ponder), some use declarative mappings (subgraphs in The Graph), and some store the raw decoded events and let the consumer transform downstream (Allium, Dune).
Store. The output of transform lands in a database. The choice of store reflects the query pattern. Wallet history and DEX analytics often use columnar stores (ClickHouse, Parquet on object storage). Subgraph-style indexers tend to use PostgreSQL with GraphQL on top. Streaming use cases land in event buses or warehouses (Kafka, BigQuery, Snowflake).
Serve. The final stage exposes the stored data to the application. Common interfaces are GraphQL (the subgraph model), REST endpoints, direct SQL access, and language-native SDKs.
Reorgs and finality. Two stages have to handle reorgs: extract, which discards orphaned blocks when the chain reorganizes, and store, which deletes rows associated with those blocks before applying the new ones. Most indexers maintain a small hot buffer of unfinalized blocks separate from the finalized history so that reorgs only affect the buffer.
4. Types of blockchain indexers
Indexers fall into four broad categories. Most teams end up using more than one.
Self-hosted frameworks. The developer writes indexer logic in code and runs the resulting process on their own infrastructure. Examples include Squid SDK, Pipes SDK, Ponder, and Envio. The team controls the database, schema, and query interface, and pays only for infrastructure. The trade-off is operational: someone has to run the archive node or data source, monitor lag, and handle reorgs.
Hosted services. The provider runs the indexer on their infrastructure and exposes the result through an API. Allium, Goldsky, and Bitquery sit in this category. The developer pays per usage and gives up direct database access; the schema is the provider's, with whatever extension points they expose. The trade-off is flexibility: a query the provider did not anticipate may not be possible.
Decentralized networks. Multiple independent operators run indexer or data-serving infrastructure and the network coordinates discovery, validation, and payment. The Graph hosts subgraphs across an indexer marketplace; SQD Network operates a decentralized data lake with worker nodes serving range queries. Relative to a single hosted service, no single operator can take the service down, but operation is more complex and economics are tied to network mechanics.
Node-side streaming. Rather than running a separate indexer that reads from an archive node, this category runs inside or alongside the node and streams decoded data out as a side effect of execution. Firehose and Substreams (both StreamingFast projects) are the canonical examples. The advantage is throughput: the data never passes through the JSON-RPC layer. The cost is coupling to specific node implementations.
Combinations are common. One pattern is an open-source SDK (Squid SDK, Pipes SDK, Ponder) running against a hosted or networked data source (SQD Network, a node provider): the developer keeps schema control without operating the archive node themselves.
5. How to evaluate a blockchain indexer
There is no universal "best" indexer; the right choice depends on what the application needs. A practical evaluation considers six axes.
Chain coverage. If the application uses one chain today and plans to stay there, any indexer that supports it works. If it spans multiple chains, or anticipates new chains as they launch, coverage becomes the dominant constraint. Check the provider's published chain list against the chains you will need over the next 12 to 18 months, including testnets if testnet support matters. SQD publishes its list at sqd.dev/chains; The Graph publishes supported networks in its docs.
Data shape. Different applications need different shapes of the same chain data. A DEX analytics product wants pre-aggregated swap volume; a wallet wants address-indexed transfer history; a compliance tool wants traces and state diffs. Indexers vary in what they expose: some serve raw decoded events only, some provide pre-built decoders for common protocols (DEXes, tokens, NFTs), some let the developer define arbitrary mappings. The evaluation question is whether the indexer's default output matches the application's query pattern, or whether the application will spend significant work transforming.
Latency budget. Real-time applications (trading, MEV, monitoring) need sub-second freshness from event emission to query result. Analytical applications can tolerate minute-scale lag. Latency targets vary by provider, chain, and product tier; each provider publishes its own figures.
Hosting model. Self-hosted gives full control and bounded costs but adds operational burden. Hosted services move that burden to a vendor but tie pricing and feature set to that vendor. Decentralized networks split the difference, with the trade-off that operation is governed by network mechanics rather than a single SLA. The right choice is the team's preference for control versus convenience, not an absolute.
Pricing model. The two common patterns are usage-based (requests, queries, or compute units) and tier-based (fixed monthly fees per package). Usage-based pricing is easier to start with but can scale unpredictably; tier-based pricing is more forecastable but requires more upfront commitment.
Lock-in. Two specific dimensions matter: schema portability (can the application's existing SQL or GraphQL queries run against another indexer?) and query-language portability (can the team migrate without rewriting most of its data layer?). Open-source frameworks keep the schema in the team's own code, which makes it portable. Closed schemas, even when the data is similar, tend to require a rewrite to switch.
For head-to-head comparisons against specific providers, see the per-tool comparison pages.
6. The 2026 indexer landscape
The indexer ecosystem in 2026 contains roughly a dozen actively-developed tools spread across the four categories above. For side-by-side comparisons against SQD, see the per-tool comparison pages.
- Apps & productsWallets Tax Payments KYC RWA
- IntelligenceComparison coming soon Comparison coming soon Comparison coming soon Comparison coming soon Comparison coming soon
- Protocol analyticsComparison coming soon Comparison coming soon
- Indexed data
- Our focus Read-side infrastructureSQD decentralized, validated, multi-chain at source
- Node providers
Self-hosted frameworks. Envio is a code-first indexer framework written in TypeScript and ReScript; it pairs with HyperSync as its data source layer and offers a managed deployment option alongside self-hosting. Ponder is a TypeScript-first indexer that targets single-application use on EVM chains; it ships with Postgres as the default store. SQD ships two TypeScript indexer frameworks: Squid SDK and Pipes SDK, targeting different access patterns; both run against the chains listed at sqd.dev/chains.
Hosted services. Allium is a managed data platform that delivers decoded chain data into the customer's warehouse (Snowflake, BigQuery, Databricks). Bitquery is a hosted GraphQL API covering many chains, with pre-built schemas for DEX trades and token transfers. Goldsky hosts subgraphs and pipelines into warehouses, focused on managing the subgraph lifecycle without operating the infrastructure. Helius is Solana-focused; it provides node RPC plus indexed APIs for transactions, balances, and NFT metadata.
Decentralized networks. SQD Network is a network of worker nodes serving range queries across the chains listed at sqd.dev/chains; the Portal is the queryable interface to the network, and the self-hosted SDKs above can run against the Portal as a data source. The Graph is a protocol for hosting subgraphs across an open marketplace of indexers, with curation and delegation as separate network roles.
Node-side streaming. Firehose is a high-throughput streaming protocol that emits decoded chain data directly from instrumented node implementations. Substreams sits on top of Firehose as a developer framework for composing transformations on the resulting stream. Both are StreamingFast projects.
On naming: Subsquid is the open-source project family that Squid SDK, Pipes SDK, and the SQD Network are built within; the GitHub organizations /subsquid and /subsquid-labs hold the protocol and developer-tools repositories respectively.
Each tool answers a slightly different question: which indexer for a single-chain dApp, which for a multi-chain analytics product, which for an enterprise data pipeline. The comparison pages walk through SQD's positioning against each named tool across chain coverage, hosting model, pricing, and openness.
7. Build vs buy
There is no permanent answer to "build or buy" for indexer infrastructure. The right answer at one company size or feature scope is the wrong answer at another. Four factors typically drive the decision.
Team capacity. Indexer operation requires someone who understands the source chain's data model, the indexer framework, and the database. If that person does not exist on the team and the company does not plan to hire them, buying a hosted service is the lower-risk option. If the team is data-engineering-heavy, building is feasible and gives more control.
Schema control. If the application's schema is core IP (a unique trading model, a proprietary lending protocol, a compliance system), the schema cannot live in a vendor's shape. Self-hosting on an open-source framework keeps the schema portable; hosted services constrain it to the vendor's offerings.
Cost model. Hosted services are operating expense, predictable per usage tier. Self-hosting is capital and operating expense (archive nodes, databases, engineering time) but caps at infrastructure cost as usage grows. The crossover point depends on usage; published rate cards from the candidate hosted providers, compared against expected infrastructure costs for self-hosting, give the team a concrete answer.
Dependency risk. A hosted indexer that goes away (acquired, deprecated, repriced) leaves the application with a migration project. Open-source frameworks with multiple operators (such as those running against SQD Network or The Graph network) reduce this risk because the framework outlives any single operator.
The hybrid pattern. The most common middle path is to write the indexer in an open-source SDK (Squid SDK, Pipes SDK, Ponder) and run it against a managed or networked data source (SQD Network, a node provider). The team keeps schema portability while the vendor handles archive nodes and data ingestion.
Frequently asked questions
What is a blockchain indexer?
Is a blockchain indexer secure?
How much does a blockchain indexer cost?
What's the difference between an indexer and an RPC node?
Can I run my own blockchain indexer?
Which blockchain indexers are widely used in 2026?
What data can a blockchain indexer extract?
Related guides
Try SQD as your blockchain indexer
Portal, the open-source Squid and Pipes SDKs, and 225+ chains.