Part III briefly covered the difficulty obtaining the unique identifier (PoolKey) for a particular Uniswap V4 pair.
In that lesson, I ran an ad-hoc event fetch through Ape to identify the parameters necessary to reconstruct the ETH-USDC pair PoolKey. But as the chain extends further from the PoolManager contract’s deployment block, it will become more painful to rely on these sorts of fetches.
Most pool data is immutable — we can extract, process, and save it to local storage for efficient access later. My favorite tool for extracting chain data is Cryo, and I’ve written a guide and shared Cryo extraction scripts for subsequent project bots.
Polars
Previous versions of my data extraction scripts have used PyArrow to interact directly with the Parquet files stored by Cryo. I’ve since switched to using Polars, a high level tool similar to Pandas. Polars abstracts away much of the complexity of reading and writing Parquet files, letting me focus on filtering and manipulating the data instead of loading files and converting between various data formats.
Polars is a “Rust under the hood” project, exposing a clean Python interface and deferring expensive computation to optimized Rust modules that can sometimes be parallelized.
The first section below will run through a complete extraction of one event using web3py, repeat it with Cryo, then inspect the resulting data with Polars to ensure it matches. Subsequent sections will perform Cryo extraction and Polars manipulation directly, skipping the web3py step.
New Pools
When a new V4 pool is created, an Initialize
event is emitted. This event is defined in IPoolManager.sol:
/// @notice Emitted when a new pool is initialized
/// @param id The abi encoded hash of the pool key struct for the new
/// pool
/// @param currency0 The first currency of the pool by address sort
/// order
/// @param currency1 The second currency of the pool by address sort
/// order
/// @param fee The fee collected upon every swap in the pool,
/// denominated in hundredths of a bip
/// @param tickSpacing The minimum number of ticks between initialized
/// ticks
/// @param hooks The hooks contract address for the pool, or address(0)
//// if none
/// @param sqrtPriceX96 The price of the pool on initialization
/// @param tick The initial tick of the pool corresponding to the
/// initialized price
event Initialize(
PoolId indexed id,
Currency indexed currency0,
Currency indexed currency1,
uint24 fee,
int24 tickSpacing,
IHooks hooks,
uint160 sqrtPriceX96,
int24 tick
);
When any event is emitted by a contract, they are included in the transaction receipt, but otherwise not recorded on the blockchain. You can inspect these values by retrieving the receipt and inspecting the data at the “logs” key.
Using the ETH-USDC pool we have been studying, we can retrieve the transaction receipt for transaction 0x5205439b7e71dfe27d0911a0b05c0380e481ae83bed1ec7025513be0e3eaecb7 using web3py:
>>> import web3
>>> w3 = web3.Web3(web3.HTTPProvider('http://localhost:8545'))
>>> pool_tx = w3.eth.get_transaction_receipt('0x5205439b7e71dfe27d0911a0b05c0380e481ae83bed1ec7025513be0e3eaecb7')
Inspect the logs:
>>> pool_tx['logs']
[AttributeDict({'address': '0x000000000004444c5dc75cB358380D2e3dE08A90', 'topics': [HexBytes('0xdd466e674ea557f56295e2d0218a125ea4b4f0f6f3307b95f85e6110838d6438'), HexBytes('0x21c67e77068de97969ba93d4aab21826d33ca12bb9f565d8496e8fda8a82ca27'), HexBytes('0x0000000000000000000000000000000000000000000000000000000000000000'), HexBytes('0x000000000000000000000000a0b86991c6218b36c1d19d4a2e9eb0ce3606eb48')], 'data': HexBytes('0x00000000000000000000000000000000000000000000000000000000000001f4000000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004449d40969b3d337975ed199f72c000000000000000000000000000000000000000000000000000000000002fb3c'), 'blockHash': HexBytes('0x237163794152eba792f0dee73d29d9f1f7c49ea9fe562001a14d9c5ad533af5d'), 'blockNumber': 21688545, 'blockTimestamp': '0x6792759b', 'transactionHash': HexBytes('0x5205439b7e71dfe27d0911a0b05c0380e481ae83bed1ec7025513be0e3eaecb7'), 'transactionIndex': 124, 'logIndex': 329, 'removed': False})]
The first value in the “topics” key will be the keccak hash of the event signature, and we can confirm that it matches by using web3py to hash the first topic in the first event in the transaction:
>>> event_signature = (
'Initialize(bytes32,address,address,uint24,int24,address,uint160,int24)'
)
>>> web3.Web3.keccak(text=event_signature) == (
pool_tx['logs'][0]['topics'][0]
)
True
Up to three topics can be indexed, which means they appear in a special position at the “topics” key in the receipt. The rest of the values are ABI-encoded together in the “data” key.
You can extract them using eth_abi
:
>>> eth_abi.decode(
types=['uint24','int24','address','uint160','int24'],
data=pool
)
(
500,
10,
'0x0000000000000000000000000000000000000000',
1385053131113435054849380000069420,
195388
)
Now let’s extract the same data using Cryo into a temporary directory:
btd@dev:~$ mkdir /tmp/cryo; cd /tmp/cryo
btd@dev:/tmp/cryo$ cryo logs --rpc http://localhost:8545 --blocks 21688545 --contract 0x000000000004444c5dc75cB358380D2e3dE08A90 --event 0xdd466e674ea557f56295e2d0218a125ea4b4f0f6f3307b95f85e6110838d6438
btd@dev:/tmp/cryo$ ls
ethereum__logs__21688545_to_21688545.parquet
Now I’ll create a virtual environment using uv, install Polars, and launch a REPL to inspect the data:
btd@dev:/tmp/cryo$ uv venv
Using CPython 3.13.1
Creating virtual environment at: .venv
Activate with: source .venv/bin/activate
btd@dev:/tmp/cryo$ uv pip install polars
Resolved 1 package in 53ms
Installed 1 package in 223ms
+ polars==1.22.0
btd@dev:/tmp/cryo$ uv run python3
Python 3.13.1 (main, Dec 6 2024, 18:40:43) [Clang 18.1.8 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import polars
Load the Parquet data into a Polars DataFrame
:
>>> data = polars.read_parquet("*.parquet")
>>> data
shape: (1, 12)
┌────────┬───────┬───────┬───────┬───┬───────┬───────┬───────┬───────┐
│ block_ ┆ trans ┆ log_i ┆ trans ┆ … ┆ topic ┆ data ┆ n_dat ┆ chain │
│ number ┆ actio ┆ ndex ┆ actio ┆ ┆ 3 ┆ --- ┆ a_byt ┆ _id │
│ --- ┆ n_ind ┆ --- ┆ n_has ┆ ┆ --- ┆ binar ┆ es ┆ --- │
│ u32 ┆ ex ┆ u32 ┆ h ┆ ┆ binar ┆ y ┆ --- ┆ u64 │
│ ┆ --- ┆ ┆ --- ┆ ┆ y ┆ ┆ u32 ┆ │
│ ┆ u32 ┆ ┆ binar ┆ ┆ ┆ ┆ ┆ │
│ ┆ ┆ ┆ y ┆ ┆ ┆ ┆ ┆ │
╞════════╪═══════╪═══════╪═══════╪═══╪═══════╪═══════╪═══════╪═══════╡
│ 216885 ┆ 124 ┆ 329 ┆ b"R\x ┆ … ┆ b"\x0 ┆ b"\x0 ┆ 160 ┆ 1 │
│ 45 ┆ ┆ ┆ 05C\x ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ 9b~q\ ┆ ┆ \x00\ ┆ \x00\ ┆ ┆ │
│ ┆ ┆ ┆ xdf\x ┆ ┆ x00\x ┆ x00\x ┆ ┆ │
│ ┆ ┆ ┆ e2}\x ┆ ┆ 00\x0 ┆ 00\x0 ┆ ┆ │
│ ┆ ┆ ┆ 09\x1 ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ … ┆ ┆ … ┆ … ┆ ┆ │
└────────┴───────┴───────┴───────┴───┴───────┴───────┴───────┴───────┘
The console view of the data is cramped, but you can gain insight by checking the column names:
>>> data.columns
['block_number', 'transaction_index', 'log_index', 'transaction_hash', 'address', 'topic0', 'topic1', 'topic2', 'topic3', 'data', 'n_data_bytes', 'chain_id']
You can pull individual values from the DataFrame
like a dictionary, using the row as an index and column name as a key:
>>> data[0]['transaction_hash']
shape: (1,)
Series: 'transaction_hash' [binary]
[
b"R\x05C\x9b~q\xdf\xe2}\x09\x11\xa0\xb0\\x03\x80\xe4\x81\xae\x83\xbe\xd1\xecp%Q;\xe0\xe3\xea\xec\xb7"
]
Series
is a Polars data type similar to a list
, and you can pull a value from a single-item Series
out with the item
method:
>>> data[0]['transaction_hash'].item()
b'R\x05C\x9b~q\xdf\xe2}\t\x11\xa0\xb0\\\x03\x80\xe4\x81\xae\x83\xbe\xd1\xecp%Q;\xe0\xe3\xea\xec\xb7'
Convert to a hex string as needed to confirm that the transaction matches:
>>> data[0]['transaction_hash'].item().hex()
'5205439b7e71dfe27d0911a0b05c0380e481ae83bed1ec7025513be0e3eaecb7'
Conversion is similar for the event topics and data:
>>> data[0]['topic0'].item().hex()
'dd466e674ea557f56295e2d0218a125ea4b4f0f6f3307b95f85e6110838d6438'
>>> data[0]['data'].item().hex()
'00000000000000000000000000000000000000000000000000000000000001f4000000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004449d40969b3d337975ed199f72c000000000000000000000000000000000000000000000000000000000002fb3c'
Now use Cryo to pull all Initialize
events from the PoolManager contract’s deployment block to the latest:
btd@dev:/tmp/cryo$ rm *.parquet
btd@dev:/tmp/cryo$ cryo logs --rpc http://localhost:8545 --blocks 21688329:latest --contract 0x000000000004444c5dc75cB358380D2e3dE08A90 --event 0xdd466e674ea557f56295e2d0218a125ea4b4f0f6f3307b95f85e6110838d6438
Load and inspect the events with Polars:
>>> import polars
>>> data = polars.read_parquet("*.parquet")
>>> data
shape: (1_159, 12)
┌────────┬───────┬───────┬───────┬───┬───────┬───────┬───────┬───────┐
│ block_ ┆ trans ┆ log_i ┆ trans ┆ … ┆ topic ┆ data ┆ n_dat ┆ chain │
│ number ┆ actio ┆ ndex ┆ actio ┆ ┆ 3 ┆ --- ┆ a_byt ┆ _id │
│ --- ┆ n_ind ┆ --- ┆ n_has ┆ ┆ --- ┆ binar ┆ es ┆ --- │
│ u32 ┆ ex ┆ u32 ┆ h ┆ ┆ binar ┆ y ┆ --- ┆ u64 │
│ ┆ --- ┆ ┆ --- ┆ ┆ y ┆ ┆ u32 ┆ │
│ ┆ u32 ┆ ┆ binar ┆ ┆ ┆ ┆ ┆ │
│ ┆ ┆ ┆ y ┆ ┆ ┆ ┆ ┆ │
╞════════╪═══════╪═══════╪═══════╪═══╪═══════╪═══════╪═══════╪═══════╡
│ 216885 ┆ 124 ┆ 329 ┆ b"R\x ┆ … ┆ b"\x0 ┆ b"\x0 ┆ 160 ┆ 1 │
│ 45 ┆ ┆ ┆ 05C\x ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ 9b~q\ ┆ ┆ \x00\ ┆ \x00\ ┆ ┆ │
│ ┆ ┆ ┆ xdf\x ┆ ┆ x00\x ┆ x00\x ┆ ┆ │
│ ┆ ┆ ┆ e2}\x ┆ ┆ 00\x0 ┆ 00\x0 ┆ ┆ │
│ ┆ ┆ ┆ 09\x1 ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ … ┆ ┆ … ┆ … ┆ ┆ │
│ 216892 ┆ 210 ┆ 455 ┆ b"f$\ ┆ … ┆ b"\x0 ┆ b"\x0 ┆ 160 ┆ 1 │
│ 54 ┆ ┆ ┆ xe6\x ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ e68\x ┆ ┆ \x00\ ┆ \x00\ ┆ ┆ │
│ ┆ ┆ ┆ f2\xa ┆ ┆ x00\x ┆ x00\x ┆ ┆ │
│ ┆ ┆ ┆ 1za\x ┆ ┆ 00\x0 ┆ 00\x0 ┆ ┆ │
│ ┆ ┆ ┆ ce\xc ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ … ┆ ┆ … ┆ … ┆ ┆ │
│ 216959 ┆ 4 ┆ 64 ┆ b"Nc\ ┆ … ┆ b"\x0 ┆ b"\x0 ┆ 160 ┆ 1 │
│ 56 ┆ ┆ ┆ xfc\x ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ c0\xd ┆ ┆ \x00\ ┆ \x00\ ┆ ┆ │
│ ┆ ┆ ┆ dB\xa ┆ ┆ x00\x ┆ x00\x ┆ ┆ │
│ ┆ ┆ ┆ 2\xb3 ┆ ┆ 00\x0 ┆ 00\x0 ┆ ┆ │
│ ┆ ┆ ┆ \x17\ ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ … ┆ ┆ … ┆ … ┆ ┆ │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 218680 ┆ 17 ┆ 64 ┆ b"Fw\ ┆ … ┆ b"\x0 ┆ b"\x0 ┆ 160 ┆ 1 │
│ 45 ┆ ┆ ┆ x8e1\ ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ xfcl{ ┆ ┆ \x00\ ┆ \x00\ ┆ ┆ │
│ ┆ ┆ ┆ \xd6\ ┆ ┆ x00\x ┆ x00\x ┆ ┆ │
│ ┆ ┆ ┆ xe65\ ┆ ┆ 00\x0 ┆ 00\x0 ┆ ┆ │
│ ┆ ┆ ┆ xdf\x ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ … ┆ ┆ … ┆ … ┆ ┆ │
│ 218681 ┆ 166 ┆ 297 ┆ b"\xe ┆ … ┆ b"\x0 ┆ b"\x0 ┆ 160 ┆ 1 │
│ 10 ┆ ┆ ┆ b\x82 ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ \xf0\ ┆ ┆ \x00\ ┆ \x00\ ┆ ┆ │
│ ┆ ┆ ┆ x11\x ┆ ┆ x00\x ┆ x00\x ┆ ┆ │
│ ┆ ┆ ┆ fa\xd ┆ ┆ 00\x0 ┆ 00\x0 ┆ ┆ │
│ ┆ ┆ ┆ 7\x0b ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ … ┆ ┆ … ┆ … ┆ ┆ │
│ 218684 ┆ 69 ┆ 231 ┆ b"\x1 ┆ … ┆ b"\x0 ┆ b"\x0 ┆ 160 ┆ 1 │
│ 96 ┆ ┆ ┆ 8M\x1 ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ 5\x83 ┆ ┆ \x00\ ┆ \x00\ ┆ ┆ │
│ ┆ ┆ ┆ \x9di ┆ ┆ x00\x ┆ x00\x ┆ ┆ │
│ ┆ ┆ ┆ i\xa3 ┆ ┆ 00\x0 ┆ 00\x0 ┆ ┆ │
│ ┆ ┆ ┆ \x1f\ ┆ ┆ 0\x00 ┆ 0\x00 ┆ ┆ │
│ ┆ ┆ ┆ … ┆ ┆ … ┆ … ┆ ┆ │
└────────┴───────┴───────┴───────┴───┴───────┴───────┴───────┴───────┘
Based on the shape of the DataFrame
, we know that Cryo extracted 1,159 new pools with 12 values of interest.
Cryo includes a nice option to decode the items in the “data” key into columns if you provide an event signature. However the feature is broken for fixed-length byte values (like the PoolId
). So the signature decoding doesn’t work for Uniswap V4 pools — hopefully a future release will fix this bug.
For now, we can extract the values using eth_abi.decode
as shown above, and then continue as normal. Since the Initialize
event is only emitted once, we can process it and move on.
I’ve packaged up the Cryo extraction and pool event processing into a pair of scripts:
Keep reading with a 7-day free trial
Subscribe to Degen Code to keep reading this post and get 7 days of free access to the full post archives.