Data access · 10 min read

An AI-assisted onchain investigation: tracing the Bybit hack in the data

On 21 February 2025, roughly 401,000 ETH was stolen from Bybit, the largest theft in the history of crypto. The funds are onchain and they are traceable, but the single most important movement, the drain itself, does not appear in a transaction list or a token-transfer feed. It is an internal call, and you need trace data to see it. This guide follows the real money with one query at a time: the drain, the fan-out across dozens of wallets, and the evidence at every hop. Every figure is pulled live from SQD's Portal and pinned to an exact block.

Updated 2026-06-18 · By the SQD team

1. The largest theft in crypto history, in the data

Start with the address Bybit and onchain investigators publicly identified as the exploiter, 0x4766…86e2. The instinct is to look at its transactions. Do that and you see almost nothing: a scatter of inbound transactions and spam-token transfers, no sign of a billion dollars. The theft is there, but it is in a different data type. One trace query over the blocks around the incident returns it directly:

The drain · block 21,895,251 · 2025-02-21 14:16 UTC
Bybit's wallet
401,346.77 ETH
The exploiter
One internal call, pulled from the Portal trace dataset. The call carries 401,346.7689 ETH (returned as wei, hex-encoded), about 1.1 billion dollars at the time and the bulk of the roughly 1.5 billion dollar Bybit theft. This single record is the entire ETH drain.

The query that returns it filters traces by the call's recipient, the same declarative shape used for logs and transactions:

your terminal
curl -s -X POST https://portal.sqd.dev/datasets/ethereum-mainnet/stream \
-H 'content-type: application/json' \
-d '{
"type": "evm",
"fromBlock": 21894000, "toBlock": 21895300,
"traces": [{ "type": ["call"], "callTo": ["0x47666fab8bd0ac7003bce3f5c3585383f09486e2"], "transaction": true }],
"fields": {
"block": { "number": true, "timestamp": true },
"transaction": { "hash": true },
"trace": { "type": true, "transactionIndex": true, "callFrom": true, "callTo": true, "callValue": true }
}
}'
live response · portal.sqd.dev (NDJSON, one block per line)
{"header":{"number":21894000,"timestamp":1740132323}}
{"header":{"number":21894067,"timestamp":1740133127}}
{"header":{"number":21894068,"timestamp":1740133139}}
... headers appear only at chunk boundaries and for matching blocks; the blocks in between are skipped ...
{"header":{"number":21895251,"timestamp":1740147371},"transactions":[{"hash":"0xb61413c495fdad6114a7aa863a00b2e3c28945979a10885b12b30316ea9f072c"}],"traces":[{"transactionIndex":35,"type":"call","action":{"from":"0x1db92e2eebc8e0c075a02bea49a2935bcd2dfcf4","to":"0x47666fab8bd0ac7003bce3f5c3585383f09486e2","value":"0x54fd0d4baa6732d1f7e6"}}]}

The Stream API returns raw NDJSON, and only the blocks that matter: a header at each chunk boundary to advance the cursor, plus the blocks that carry a matching trace. The blocks in between never materialize, so across this 1,300-block window the drain is the single line with trace data, inside transaction 0xb614…072c (joined in with "transaction": true so each hop carries the hash you re-pull). The value is wei, hex-encoded, exactly as the chain stores it: 0x54fd0d4baa6732d1f7e6 is 401,346.77 ETH, and the header timestamp 1740147371 is 2025-02-21 14:16:11 UTC. There is no decoding service in the middle, just the record.

2. Why the 401,346 ETH drain is invisible to most tools

This is the part that matters for any investigation tool, and the reason the case is worth showing. The drain executed inside a Safe multisig transaction, so the ETH moved as an internal call, not as a top-level transaction and not as a token transfer. The two data types most tools start with, the transaction list and ERC-20 transfers, do not contain it. (The staked-ETH tokens in the wider theft did move as ordinary ERC-20 transfers a token feed indexes; it is the ETH leg, the single largest piece, that hides.) A wallet summary of the exploiter over the day of the hack confirms it in the bluntest way possible, here the fund_flow.summary the Portal returns:

portal wallet_summary · exploiter · 2025-02-21 to 02-22
"inbound_events": 10,
"outbound_events": 0,
"native_received_eth": 0,
"native_sent_eth": 0,
"top_counterparty": "0x0000000000000000000000000000000000000000",
"largest_movement": {
"asset": "MYSTERY",
"amount": "18644407607.981136 MYSTERY",
"direction": "in"
}

Native received: 0 ETH. The only inbound value the native-and-token view can see is dust like the MYSTERY spam token; the biggest theft in crypto history is reduced to noise, because the one record that matters is a trace. Pull the trace dataset and the 401,346 ETH is right there. The lesson generalizes: in any serious investigation the decisive hop is often an internal call, so a tool that cannot query complete traces is blind exactly when it counts. The data type itself, and why it is expensive to produce, is the subject of internal transactions explained.

3. Follow the money: the fan-out

Flip the filter from callTo to callFrom and trace the money out of the exploiter. Within hours it does what stolen funds always do: it splits. In the window right after the drain, 400,001 ETH leaves the exploiter across 41 internal calls: forty transfers of exactly 10,000 ETH to fresh wallets, plus a single 1 ETH test. The round chunks are swept out in this first wave; the odd 1,345.77 ETH is left out of the round-number sweep, which is why the fan-out totals 400,001 and not the full 401,346.77 that came in. The first few hops:

To (fresh wallet)ETHblock
Real, from the Portal trace dataset, blocks 21,895,319 onward. The 10,000 ETH splitting pattern is the first laundering layer; each recipient becomes the next address to query, and the trail continues from there.

Every one of those 41 destinations is a fresh lead. The investigation is now a tree, and the data hands you the branches: run the same trace query with each recipient as callFrom and you follow the next layer, hop after hop, in the same shape.

This first wave is the native-ETH leg only. Public forensics put the full dispersal at about fifty wallets of roughly 10,000 ETH once the stolen stETH, cmETH, and mETH were swapped to ETH; those wallets were drained again over the following days, and the funds were ultimately routed into Bitcoin. The same trace query walks every hop of that longer path.

4. The loop, and the pivot envelope

What keeps a branching trail from becoming guesswork is that the Portal returns the next step attached to the data. A wallet_summary on any address comes back with a next_pivots list that names the exact tool and argument to call next. Run it on the exploiter over the day of the hack and the response says, in its own words, where to go:

live response · wallet_summary (next_pivots)
"next_pivots": [
{ "goal": "Investigate the top counterparty",
"tool": "portal_get_wallet_summary",
"address": "0xa7f5d0c2b6dad000abc3a5f0db7386041ef27bb8" },
{ "goal": "Inspect the largest movement transaction",
"tool": "portal_evm_query_transactions",
"transaction_hash": "0xbc876cfac4bd8e918614d3d97b4aa3bce1d3fc253de1b71fe049156025391699" }
]

Alongside it a pivots list tags every extractable value with how to reuse it as a filter:

Field foundReuse as filter
  • sender from_addresses
  • recipient to_addresses
  • tx_hash transaction_hash
  • block_number from_block / to_block
The use_as tags turn a result field into the right argument for the next call, and every response reports the queried_blocks it covered, so each hop in the Bybit trail is anchored to an exact range. That is what makes the path documented and reproducible rather than improvised.

So the loop is: summarize an address, take the pivot, pull the traces, follow the value, cite the block. An agent runs it automatically; the envelope is what lets it chain hops without inventing parameter names. The same loop works on Solana instructions and Bitcoin inputs and outputs, where portal_bitcoin_query_transactions exposes sender and recipient the same way, so a trail can cross virtual machines without changing tools.

5. Evidence you can defend

What makes this trail stand up is that every step is a real record, anchored and reproducible. 401,346.77 ETH moved from Bybit's wallet to the exploiter at block 21,895,251, then 400,001 ETH moved out across 41 internal calls in the first wave (the odd 1,345.77 ETH was left out of that round-number sweep), and each hop is pinned to an exact block and transaction you can re-pull. The wallet summary's figures are net flow over the window you queried, so they mean exactly what they say. That is the strength of the trail: a documented, reproducible path of what moved, when, and where, backed by the precise records rather than inference. For the regulatory framing around this kind of evidence, see compliance data for crypto.

6. Why this is harder elsewhere

Traces exist in plenty of places, behind a paid debug-RPC method you call one transaction at a time. The Bybit case shows why that is not enough: the decisive movement is an internal call, so you need traces just to see it, and then you need to follow it across 41 branches and the layers below them. Calling a debug method per transaction and stitching the results by hand does not scale to that.

Having complete, queryable trace and state-diff datasets you can filter by callTo or callFrom across a whole range, with the pivot envelope chaining each hop, and self-hostable under AGPL-3.0 rather than metered per call, is the difference between seeing the theft and missing it. The trail also spans EVM, Solana, and Bitcoin in one model.

For the trace data type and its cost, read internal transactions explained; for the compliance framing, compliance data for crypto. For the agent pattern that drives the loop, see AI agents and onchain data.

Frequently asked questions

Can SQD trace a real onchain theft?
Yes, including the largest one. On 21 February 2025, 401,346.77 ETH was drained from a Bybit wallet. One Portal trace query returns the movement directly: at block 21,895,251 an internal call sent that ETH from Bybit's wallet (0x1db92e2e...) to the address Bybit publicly identified as the exploiter. Within hours the funds fan out across dozens of fresh wallets, and the trace data follows every hop with an exact block and transaction.
Why did the 401,346 ETH drain not show up in the wallet's transaction list?
Because it moved as an internal call, not a top-level transaction and not a token transfer. The drain executed inside a Safe multisig transaction, so the 401,346 ETH transfer is a trace, not a row in the transaction list. A wallet summary over that window reports native received as 0 ETH for exactly this reason. Only complete trace data surfaces the movement, which is the whole point: the most important hop in the investigation is invisible to the two data types most tools start with.
What is the investigation pivot envelope?
Every Portal investigation response carries a next_pivots list and a pivots list. next_pivots names a concrete follow-up with the tool and argument pre-filled, for example the wallet summary for the top counterparty. The pivots list tags each extractable value with how to reuse it: a sender tagged as from_addresses, a transaction hash as transaction_hash, a block number as from_block and to_block. A finding becomes a ready-made next query, so an agent chains hops without inventing parameters.
What do the wallet summary figures represent?
Net movement over the window you queried: inbound, outbound, and the net for each asset. Read them as flow over that range, so the numbers mean exactly what they say. For the Bybit trail that precision is the point: a cited, reproducible record of what moved between which addresses and when, every figure anchored to an exact block and transaction.
Can I run the trace layer myself?
The Portal is open source under AGPL-3.0, so the read layer that serves these traces and state diffs can be self-hosted rather than rented per call. Combined with complete, queryable trace and state-diff datasets and the pivot envelope, that is what lets an investigation chain across many hops, and across EVM, Solana, and Bitcoin, without metering each trace request.

Building monitoring or investigation tooling?

See how trace and state-diff data feed transaction monitoring on the compliance solution page.