Concepts · 12 min read
What is a Solana indexer?
A Solana indexer turns Solana onchain data (slots, transactions, instructions, account state changes, program logs) into queryable database tables. The pipeline shape mirrors any other blockchain indexer, but Solana's data model and throughput change what the decoder, the storage layer, and the real-time path have to do.
1. What is a Solana indexer?
A Solana indexer is a service that ingests Solana data and stores it in tables the application can query analytically. SQD's Solana data lake exposes six tables (transactions, instructions, balances, token_balances, rewards, logs), each carrying one primitive of Solana's runtime; SQD's published Solana mainnet schema lists the exact columns. Other Solana indexers expose different shapes; this guide uses SQD's schema as the concrete reference.
One detail worth surfacing up front because it changes what's possible: SQD's Solana data lake covers Solana mainnet from slot 0 (genesis). Many Solana providers expose only a recent window, because backfilling Solana's full history is expensive; products that need years-back analytics (token issuance history, validator economics over time, protocol launches predating the provider) can read it directly without snapshot dances. The previous v2 archive (now deprecated) started at slot 269,828,500 for reference.
What separates a Solana indexer from an EVM indexer is not the role (read chain, decode, store, serve) but the work inside each step. Solana has no event logs in the EVM sense, so decoding can't just match an event signature. It has cross-program invocations and an account-write firehose, so the storage layer carries more volume per slot. And it has separate confirmation levels for the same slot, so the pipeline has to handle two streams instead of one.
Most of this article is about those differences in concrete terms.
2. Solana's data model: instructions, accounts, logs
A Solana slot (the closest analogue to an EVM block) is produced about every 400ms by whichever validator currently has leader rights. Each slot contains a list of transactions. Each transaction is a list of instructions. That's the chain of containment that drives the data model.
Transactions carry the signatures array, the account_keys array (every account the transaction touches), the recent_blockhash, the fee in lamports, and a success boolean. If the transaction failed, an error object is also present. Fee payer is just account_keys[0] by convention.
Instructions have three fields that matter to the indexer: program_id (the program being called), accounts (an unlabeled array of references into the parent transaction's account_keys), and data (a base58-encoded byte string the program parses according to its own ABI). The data field is opaque; only the program knows how to interpret it.
Inner instructions (the array under each top-level instruction) are cross-program invocations: when a program calls another program inside the same transaction, the called instruction appears here. A Jupiter swap, for example, is one top-level Jupiter instruction whose inner_instructions array contains the actual DEX swaps that Jupiter routed through.
Balance changes. The indexer captures account state movement via two tables. balances carries per-account SOL deltas (pre / post lamports). token_balances carries SPL token deltas with the mint address, the owner, the decimals, and pre/post amounts. Most production indexers reconstruct "this address sent X amount of token Y" by joining the two.
Program logs. Many Solana programs use logs as a poor-man's event mechanism. They show up in the logs table with the emitting program_id, an instruction_address path (because logs can come from inner instructions), a kind classification, and the message text. Anchor programs emit structured event logs (emit! macros) that can be deserialised against the program's IDL.
Rewards. Validator staking rewards, voting rewards, and rent collection appear in rewards: pubkey, lamports, reward_type, and commission percentage. Most application-side indexers ignore this table; staking analytics products live on it.
3. How decoders identify instructions: discriminators and ABIs
An instruction's data blob is opaque bytes; the indexer has to know which handler inside the program the bytes correspond to. Solana programs solve this with a discriminator: a fixed-width prefix at the start of the data field that names the handler.
Four sizes are common. SPL Token and most native programs use a 1-byte discriminator (one byte of the data field selects among ~20 handlers like Transfer, MintTo, Approve). Some programs use 2- or 4-byte discriminators. Anchor, the dominant Solana smart-contract framework, uses an 8-byte discriminator: the first 8 bytes of sha256("global:" + instruction_name). Anchor IDLs publish these discriminators alongside the typed argument shapes, which is why most Solana decoders can autogenerate handlers for Anchor programs but need hand-written decoders for non-Anchor ones.
A real decoder filters by program_id first, then matches the discriminator, then deserialises the remaining bytes against the corresponding ABI struct. SQD's solanaInstructionDecoder exposes this as four filter fields named after the discriminator width: d1, d2, d4, d8. A typical filter looks like { programId: [whirlpool.programId], d8: [whirlpool.swap.d8] }, which is how the SQD Orca Whirlpool example targets just the swap instruction without pulling every other Whirlpool call.
Two real-world wrinkles: programs sometimes reuse the same discriminator after an upgrade (the indexer needs to version the decoder), and CPIs can target unknown programs (the indexer either keeps the raw bytes for later decoding or drops them). Production indexers handle both with versioned ABI registries and an onError hook that lets bad instructions be skipped rather than halting the pipeline.
4. How a Solana indexer works
The pipeline stages mirror any indexer (see the indexer overview), with Solana-specific details at each step.
Extract. Three sources are practical. A Geyser plugin runs inside a validator and streams transactions, account writes, and slot updates the moment they happen. getBlock, getSignaturesForAddress, and similar RPC methods serve the same data on demand but with much higher latency under load. A managed data lake (such as the SQD Portal at https://portal.sqd.dev/datasets/solana-mainnet) abstracts the choice and serves filtered ranges over HTTP, with full history from slot 0 available the moment a developer asks for it (i.e. without operating an archive validator or paying for snapshot exports).
Decode. For each instruction in each transaction, the indexer dispatches to a program-specific decoder. The decoder maintains a registry of (program_id, discriminator) -> handler. Most production indexers ship pre-built decoders for the standard programs (SPL Token, Token-2022, Metaplex, Compute Budget) and let developers add their own for application contracts.
Transform. Common Solana-specific transformations: resolving mint addresses to symbols and decimals by reading the SPL Token mint account, flattening inner instructions into a single decoded-action stream, and joining instruction-level data with token_balance deltas to produce "address X sent N units of token Y" rows.
Store. Solana's volume tends to push teams toward columnar stores. SQD's Pipes SDK ships first-class targets for ClickHouse and for Postgres with Drizzle as the schema layer; raw historical blobs land in Parquet on object storage in many production stacks.
Serve. The output stage is GraphQL, SQL, REST, or a language SDK, as with any indexer. Pipes SDK lets the developer choose by wiring up an HTTP layer over the target store; Squid SDK ships GraphQL as the default with a custom resolver path for SQL.
A minimal Pipes SDK pipeline that reads Orca Whirlpool swaps from the Portal and writes them into Postgres looks like this:
const custom = solanaInstructionDecoder({
range: { from: 'latest' },
programId: ['whirLbMiicVdio4qvUfM5KAg6Ct8VwpYzGff3uctyCc'],
instructions: { swap: orcaWhirlpoolInstructions.swap },
}).pipe(enrichEvents)
await solanaPortalSource({
id: 'solana-orca-pipe',
portal: 'https://portal.sqd.dev/datasets/solana-mainnet',
outputs: { custom },
}).pipeTo(drizzleTarget({
db: drizzle(env.DB_CONNECTION_STR),
tables: [orcaWhirlpoolSwapTable],
onData: async ({ tx, data }) => {
for (const values of chunk(data.custom.swap)) {
await tx.insert(orcaWhirlpoolSwapTable).values(values)
}
},
})) 5. Real-time, finalized, and fork handling
Solana exposes three commitment levels for the same slot: processed (the validator has executed the block but nothing else has confirmed it), confirmed (a supermajority of stake has voted on the block), and finalized (the block is part of the canonical chain and won't be rolled back). Production indexers usually consume the confirmed stream for low-latency views and the finalized stream for settled data, with the schema exposing the commitment state so consumers know which they are reading.
Reorgs at the finalized level are essentially never observed. Reorgs at the confirmed level happen occasionally and the indexer has to handle them. SQD's Portal serves two distinct endpoints, /stream and /finalized-stream, with corresponding /head and /finalized-head for current state. The gap between them tracks Solana's finalization window of about 32 slots; the absolute indexed lag varies with chain load and is published by the head endpoint.
Fork detection. When a confirmed-stream consumer detects that the chain has diverged from the local history, the Portal returns an HTTP 409 with a sample of the canonical chain's recent slots and their hashes. The pipeline finds the common ancestor by merge-sorting local history against the sample (matching both slot number and hash), truncates downstream state back to that point, and resumes streaming from the next slot. SQD's Pipes SDK does this automatically with a default 1000-slot rollback window.
Sub-second paths. Applications that need lower latency than the indexed stream can offer (sub-second from event emission to query result) usually pair an indexer with a parallel Geyser-fed websocket stream for the hot path, then reconcile against the indexer's view as slots finalize. SQD provides this hot path through a separate streaming client; the trade-off versus indexer queries is the same as everywhere else: faster, less queryable, more brittle.
6. Decoding the common Solana protocols
Most applications need decoded data for a handful of widely-used programs. The list below covers what production Solana indexers ship decoders for and the practical wrinkles each one introduces.
SPL Token (TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA). The legacy SPL Token program. Uses a 1-byte discriminator (the first byte of data selects one of ~20 handlers like Transfer, MintTo, Burn, Approve, CloseAccount). A complete token-flow indexer reads SPL Token instructions plus the token_balances table to capture both the action and the resulting state delta.
SPL Token-2022 (TokenzQdBNbLqP5VEhdkAS6EPFLC1PHnBqCXEpPxuEb). The newer SPL Token program with extensions: transfer fees, non-transferable tokens, confidential transfers (zero-knowledge balances), permanent delegate, default account state. Each extension can sit on a mint and changes how transfers behave; indexers that miss the extension state report stale or wrong balances. A complete Solana indexer decodes both Token and Token-2022.
Metaplex Token Metadata. NFT mint, update, and transfer activity. The onchain metadata account points at an off-chain JSON URI (usually Arweave or IPFS) for the image and trait attributes. Production NFT pipelines cache the resolved metadata and re-fetch on UpdateMetadata instructions. Compressed NFTs (Bubblegum) are a separate beast: they live as Merkle-tree leaves with an onchain root, and reconstruction requires reading every Bubblegum instruction to keep the tree in sync. Most production cNFT consumers use Digital Asset Standard (DAS) APIs rather than raw decoding for this reason.
DEX programs. Raydium (AMM and CLMM), Orca (Whirlpool), Meteora (DLMM and dynamic pools), Phoenix (orderbook), Lifinity, and Jupiter (the aggregator) are the canonical onchain venues. Each emits swap data in its own instruction shape; decoders are per-program. Jupiter is particularly important because it aggregates: a Jupiter swap's inner_instructions array contains the actual DEX swaps the router executed, and any DEX-volume index that ignores Jupiter undercounts by a large factor.
Lending and perps. Kamino, MarginFi, Drift, and Solend each have program-specific event shapes. Coverage for these is thinner than DEX coverage in most general-purpose decoders; teams building specifically for lending or perps usually write their own decoders or extend an existing one.
Native programs. System Program (11111111111111111111111111111111) for SOL transfers and account creation, Compute Budget for fee/CU configuration, Vote for validator votes. Application indexers usually ignore these; analytics and validator-economics indexers don't.
7. The SVM family: Eclipse, SOON, SVM-BNB
"Solana" in indexing context now means more than just Solana mainnet. The SVM (Solana Virtual Machine) is being used as the execution layer for several adjacent chains, all of which expose the same instruction/account/log data model and decode against the same Anchor IDLs.
Solana mainnet / devnet. The canonical clusters. Devnet is the public test cluster; mainnet is production.
Eclipse. A general-purpose SVM rollup that settles to Ethereum. Mainnet and testnet. The data model is identical to Solana's, which means Solana indexer code runs on Eclipse without changes (assuming the indexer's source supports the network).
SOON. SVM execution layer with separate mainnet, devnet, and testnet clusters. Same handler code, different chain.
SVM-BNB. BNB Chain's SVM-compatible execution. Mainnet and testnet.
Why this matters for tool selection: an indexer that supports "Solana" but not the SVM family forces a multi-stack rebuild every time the application wants to follow the same protocol onto a new SVM chain. SQD's Solana data lake covers all of the above (documented dataset list) with the same schema, so the same handler runs unchanged across the family. Other tools have varying coverage; check the published network list before committing if SVM portability matters to the application.
8. Solana indexing tools in 2026
Tools commonly used for Solana indexing in 2026. For head-to-heads against SQD, see the comparison pages.
Hosted, Solana-focused. Helius provides Solana RPC plus indexed APIs for transactions, balances, NFT metadata, and DAS. Shyft exposes Solana data through GraphQL and REST. Triton provides high-performance Solana RPC and Geyser access.
SVM coverage
SQD indexes the SVM family through one unified schema. The same handler code runs across every network below.
Tables exposed per network
transactionsinstructionsbalancestoken_balancesrewardslogs SQD's Solana surface. Three layers, each documented in docs.sqd.dev.
- Portal, the data lake. Six tables (transactions, instructions, balances, token_balances, rewards, logs) served over HTTP, with separate
/streamand/finalized-streamendpoints. Language-agnostic. Covers the full SVM family. - Pipes SDK, a TypeScript framework. Pre-built components for Portal source, instruction decoder, Drizzle/Postgres and ClickHouse targets; automatic fork handling with a 1000-slot rollback window. Pipelines are described as composable modules rather than a monolithic indexer process.
- Squid SDK, the older code-first framework with built-in Postgres and GraphQL. Same Portal source under the hood; ships GraphQL out of the box.
Multi-chain platforms with Solana support. Goldsky hosts Solana subgraphs. SubQuery supports Solana mappings as part of its multi-chain offering.
Validator-side streaming. Geyser plugins are the canonical low-latency path. Firehose for Solana is one well-known implementation; many teams run a Geyser plugin paired with a streaming consumer they write themselves.
9. Solana-specific vs multi-chain indexer
A Solana-only product can choose a Solana-focused tool and get tighter coverage for its specific needs: DAS APIs for cNFTs, Solana-native query shapes, ecosystem-specific decoders. The trade-off is portability: if the product later adds a second chain, a Solana-only stack does not go there.
A multi-chain product benefits from a multi-chain indexer that includes Solana and ideally the SVM family. The schema is unified across chains, the application has one query layer, and adding the next SVM chain (or, separately, the next EVM chain) is a configuration change rather than a parallel build.
The multi-chain indexing guide covers the architectural trade-offs in more depth. The evaluation framework applies axis-by-axis: chain coverage (including SVM-family completeness), data shape, latency budget, hosting model, pricing, and lock-in.
Frequently asked questions
What is a Solana indexer?
How is Solana indexing different from Ethereum indexing?
What is the discriminator in a Solana instruction?
What is a cross-program invocation (CPI)?
How fast can a Solana indexer be?
Which networks count as Solana for indexing purposes?
Which tools index Solana data in 2026?
Related guides
Index Solana with SQD
The Portal serves Solana, Eclipse, SOON, and SVM-BNB. The Squid and Pipes SDKs handle Solana decoding with first-class Anchor and discriminator support.