This post will cover a critical requirement of identifying and securing mempool arbitrage opportunities — predicting the effect of a pending transaction.
The idea is simple but the practice is complex:
Monitor and update the state of the blockchain
Observe pending transactions that will affect the state
Predict the future blockchain state resulting from the pending transaction
Calculate the opportunity at the future state
If you’re going to do mempool arbitrage, you absolutely need an accurate model of what a transaction will do.
In the mempool backrunning bot project, I built a simple mempool prediction code block that worked with the CRA-WAVAX pair.
It was a useful example to illustrate the process, but it has too many hard-coded assumptions. This post will work outward from that example and make the function as generic as possible (functional for any two tokens using UniswapV2-style functions).
Each section will address a specific aspect of the prediction code.
I have kept this post focused on two-token swaps. In the future I will present changes to model swaps using 3 (or more) tokens.
Improve The TX Parameter Grouping
Originally I was storing transaction parameters with distinct variables. This was OK at first, but if I’m going to be storing all of these items anyway, it makes sense to group them inside a dictionary instead.
So this code block:
tx_timestamp = time.monotonic()
tx_hash = tx_message.get("hash")
tx_to = web3.Web3().toChecksumAddress(tx_message.get("to"))
tx_from = web3.Web3().toChecksumAddress(tx_message.get("from"))
tx_value = int(tx_message.get("value"), 16)
tx_input = tx_message.get("input")
tx_type = tx_message.get("txType")
tx_gas_fee = int(tx_message.get("gasPrice"), 16)
tx_max_fee = int(tx_message.get("maxFeePerGas"), 16)
tx_priority_fee = int(tx_message.get("maxPriorityFeePerGas"), 16)
Transforms to:
tx_dict = {
"timestamp": start_time,
"block_number": int(tx_message.get("blockNumber"), 16),
"hash": tx_message.get("hash"),
"nonce": int(tx_message.get("nonce"), 16),
"to": w3.toChecksumAddress(tx_message.get("to")),
"from": w3.toChecksumAddress(tx_message.get("from")),
"value": int(tx_message.get("value"), 16),
"input": tx_message.get("input"),
"type": tx_message.get("txType"),
"gas_price": int(tx_message.get("gasPrice"), 16),
"max_fee": int(tx_message.get("maxFeePerGas"), 16),
"priority_fee": int(tx_message.get("maxPriorityFeePerGas"), 16),
}
Not a big difference in practice, but this improve the ability to quickly pass around transaction parameters later, instead of trying to wrangle variables.
Identify Transactions We Can Predict
In the previous lesson I covered arbitrage path building in an automated way. If you’ve implemented that section into your bots, you will have many more token, liquidity pool, and arbitrage helper objects. With more tokens come more opportunities and more mempool transactions to model.
First, let’s transform this simple block:
if tx_path in [
[cra.address, wavax.address],
[wavax.address, cra.address],
]:
print()
print("*** Pending CRA-WAVAX swap! ***")
print(f"DEX: {ROUTERS[tx_to]['name']}")
print(f"TX: {tx_hash}")
print(func.fn_name)
if ROUTERS[tx_to]["name"] == "TraderJoe":
lp = traderjoe_lp_cra_wavax
future_lp = traderjoe_lp_cra_wavax_future
elif ROUTERS[tx_to]["name"] == "Pangolin":
lp = pangolin_lp_cra_wavax
future_lp = pangolin_lp_cra_wavax_future
Begin by changing the evaluation of tx_path
to compare against the set of tokens that we known about, instead of just CRA-WAVAX and WAVAX-CRA (we include the forward and reverse since swaps can occur in either direction).
Python Sets
Whenever you’re working in Python and want to analyze big groups of “stuff” relative to other groups of “stuff”, I recommend using the set data type. A set is much like a list, but it is unordered, unindexed, and de-duplicated by default. The syntax for a set is {item1, item2, item3}
, and a list can be converted to a set using the set()
function. A set offers many useful features including the ability to discover items in common, items shared between two groups, items exclusive to one group or another, etc. Read through the Python set documentation to learn more.
Here I’m going to use set()
with the intersection()
method to compare the token addresses in the transaction (via the 'path' dictionary key of the params
variable, obtained from the decode_function_input()
method within web3py.
The goal is to take a set with all token addresses in the swap path, intersect that with a set of all tokens that we know about (via degenbot_tokens.keys()
). If that matches the swap path exactly, that implies that we are monitoring both tokens, and this is a transaction that can be predict.
# ignore the TX unless it was sent to an address
# on our watchlist
if tx_dict.get("to") not in ROUTERS.keys():
continue
else:
try:
func, params = w3.eth.contract(
address=w3.toChecksumAddress(tx_dict.get("to")),
abi=ROUTERS.get(tx_dict.get("to")).get("abi"),
).decode_function_input(tx_dict.get("input"))
except Exception as e:
print(f"error decoding function: {e}")
print(f"tx: {tx_dict.get('hash')}")
continue
# params.get('path') returns None if not found so check it
# first, then compare all tokens in the path to the list of
# known tokens that we are monitoring
if params.get("path") and set(params.get("path")) == set(
[
w3.toChecksumAddress(token_address)
for token_address in params.get("path")
]
).intersection(degenbot_tokens.keys()):
print()
print("*** PENDING MEMPOOL TX ***")
print(func.fn_name)
If the set stuff seems like magic, run the following simple examples in a console:
>>> set1 = {1,2}
>>> set2 = {1,2,3,4,5,6,7,8}
>>> set1.intersection(set2)
{1, 2}
>>> set1 == set1.intersection(set2)
True
>>> set1 == set2.intersection(set1)
True
>>> {1,2} == {2,1}
True
For sets, since order does not matter, you can always perform an equality comparison of the 'path' against its intersection with your known tokens. If True
, proceed!
Prepare Helper Objects
This is a new section. The previous example only used CRA-WAVAX, so we just used those helper objects wherever needed. Since this code will be generic, we need to gather the required helper objects on demand.