Data access · 10 min read
An AI-assisted onchain investigation: tracing the Bybit hack in the data
On 21 February 2025, roughly 401,000 ETH was stolen from Bybit, the largest theft in the history of crypto. The funds are onchain and they are traceable, but the single most important movement, the drain itself, does not appear in a transaction list or a token-transfer feed. It is an internal call, and you need trace data to see it. This guide follows the real money with one query at a time: the drain, the fan-out across dozens of wallets, and the evidence at every hop. Every figure is pulled live from SQD's Portal and pinned to an exact block.
1. The largest theft in crypto history, in the data
Start with the address Bybit and onchain investigators publicly identified as the exploiter, 0x4766…86e2. The instinct is to look at its transactions. Do that and you see almost nothing: a scatter of inbound transactions and spam-token transfers, no sign of a billion dollars. The theft is there, but it is in a different data type. One trace query over the blocks around the incident returns it directly:
The query that returns it filters traces by the call's recipient, the same declarative shape used for logs and transactions:
The Stream API returns raw NDJSON, and only the blocks that matter: a header at each chunk boundary to advance the cursor, plus the blocks that carry a matching trace. The blocks in between never materialize, so across this 1,300-block window the drain is the single line with trace data, inside transaction 0xb614…072c (joined in with "transaction": true so each hop carries the hash you re-pull). The value is wei, hex-encoded, exactly as the chain stores it: 0x54fd0d4baa6732d1f7e6 is 401,346.77 ETH, and the header timestamp 1740147371 is 2025-02-21 14:16:11 UTC. There is no decoding service in the middle, just the record.
2. Why the 401,346 ETH drain is invisible to most tools
This is the part that matters for any investigation tool, and the reason the case is worth showing. The drain executed inside a Safe multisig transaction, so the ETH moved as an internal call, not as a top-level transaction and not as a token transfer. The two data types most tools start with, the transaction list and ERC-20 transfers, do not contain it. (The staked-ETH tokens in the wider theft did move as ordinary ERC-20 transfers a token feed indexes; it is the ETH leg, the single largest piece, that hides.) A wallet summary of the exploiter over the day of the hack confirms it in the bluntest way possible, here the fund_flow.summary the Portal returns:
Native received: 0 ETH. The only inbound value the native-and-token view can see is dust like the MYSTERY spam token; the biggest theft in crypto history is reduced to noise, because the one record that matters is a trace. Pull the trace dataset and the 401,346 ETH is right there. The lesson generalizes: in any serious investigation the decisive hop is often an internal call, so a tool that cannot query complete traces is blind exactly when it counts. The data type itself, and why it is expensive to produce, is the subject of internal transactions explained.
3. Follow the money: the fan-out
Flip the filter from callTo to callFrom and trace the money out of the exploiter. Within hours it does what stolen funds always do: it splits. In the window right after the drain, 400,001 ETH leaves the exploiter across 41 internal calls: forty transfers of exactly 10,000 ETH to fresh wallets, plus a single 1 ETH test. The round chunks are swept out in this first wave; the odd 1,345.77 ETH is left out of the round-number sweep, which is why the fan-out totals 400,001 and not the full 401,346.77 that came in. The first few hops:
- 0x36ed…e4cb 10,000 21,895,451
- 0xaf62…6ce9 10,000 21,895,708
- 0x3a21…9847 10,000 21,895,708
- 0xfa3f…6c49 10,000 21,895,709
- 0xfc92…6465 10,000 21,895,709
- and 36 more internal calls, 400,001 ETH total across 41
Every one of those 41 destinations is a fresh lead. The investigation is now a tree, and the data hands you the branches: run the same trace query with each recipient as callFrom and you follow the next layer, hop after hop, in the same shape.
This first wave is the native-ETH leg only. Public forensics put the full dispersal at about fifty wallets of roughly 10,000 ETH once the stolen stETH, cmETH, and mETH were swapped to ETH; those wallets were drained again over the following days, and the funds were ultimately routed into Bitcoin. The same trace query walks every hop of that longer path.
4. The loop, and the pivot envelope
What keeps a branching trail from becoming guesswork is that the Portal returns the next step attached to the data. A wallet_summary on any address comes back with a next_pivots list that names the exact tool and argument to call next. Run it on the exploiter over the day of the hack and the response says, in its own words, where to go:
Alongside it a pivots list tags every extractable value with how to reuse it as a filter:
- sender from_addresses
- recipient to_addresses
- tx_hash transaction_hash
- block_number from_block / to_block
use_as tags turn a result field into the right argument for the next call, and every response reports the queried_blocks it covered, so each hop in the Bybit trail is anchored to an exact range. That is what makes the path documented and reproducible rather than improvised.
So the loop is: summarize an address, take the pivot, pull the traces, follow the value, cite the block. An agent runs it automatically; the envelope is what lets it chain hops without inventing parameter names. The same loop works on Solana instructions and Bitcoin inputs and outputs, where portal_bitcoin_query_transactions exposes sender and recipient the same way, so a trail can cross virtual machines without changing tools.
5. Evidence you can defend
What makes this trail stand up is that every step is a real record, anchored and reproducible. 401,346.77 ETH moved from Bybit's wallet to the exploiter at block 21,895,251, then 400,001 ETH moved out across 41 internal calls in the first wave (the odd 1,345.77 ETH was left out of that round-number sweep), and each hop is pinned to an exact block and transaction you can re-pull. The wallet summary's figures are net flow over the window you queried, so they mean exactly what they say. That is the strength of the trail: a documented, reproducible path of what moved, when, and where, backed by the precise records rather than inference. For the regulatory framing around this kind of evidence, see compliance data for crypto.
6. Why this is harder elsewhere
Traces exist in plenty of places, behind a paid debug-RPC method you call one transaction at a time. The Bybit case shows why that is not enough: the decisive movement is an internal call, so you need traces just to see it, and then you need to follow it across 41 branches and the layers below them. Calling a debug method per transaction and stitching the results by hand does not scale to that.
Having complete, queryable trace and state-diff datasets you can filter by callTo or callFrom across a whole range, with the pivot envelope chaining each hop, and self-hostable under AGPL-3.0 rather than metered per call, is the difference between seeing the theft and missing it. The trail also spans EVM, Solana, and Bitcoin in one model.
For the trace data type and its cost, read internal transactions explained; for the compliance framing, compliance data for crypto. For the agent pattern that drives the loop, see AI agents and onchain data.
Frequently asked questions
Can SQD trace a real onchain theft?
Why did the 401,346 ETH drain not show up in the wallet's transaction list?
What is the investigation pivot envelope?
What do the wallet summary figures represent?
Can I run the trace layer myself?
Related guides
Building monitoring or investigation tooling?
See how trace and state-diff data feed transaction monitoring on the compliance solution page.