Uniswap V3 Bot — Bugfixes (Part IV)

Squish!

Jan 17, 2023

This is a short post detailing three bug fixes for the Uniswap V3 bot.

I will release periodic posts like this as I identify and fix bugs. Thank you for reporting issues!

Late Events Fix

The “soft timeout” feature of watch_events is generally robust, but I’ve caught a handful of times when the websocket is delayed and delivers events slower than usual. I programmed the watcher function to send arbs for processing once per block, but that can lead to missing a check on some arbs if those events are received late. The events themselves are still processed correctly at the pool helpers, but the arb helpers will be ignored for that block.

The bug was ironically introduced in my first bugfix post. The fix is to update this code block from this:

last_processed_block = newest_block

while True:

    try:
        message = json.loads(
            await asyncio.wait_for(
                websocket.recv(),
                timeout=_TIMEOUT,
            )
        )

    # if no event has been received in _TIMEOUT seconds, assume all 
    # events have been received, reduce the list of arbs to check with 
    # set(), repackage and send for processing, mark the current block 
    # as processed, then clear the working queue
    except asyncio.exceptions.TimeoutError as e:
        if last_processed_block < newest_block:
            asyncio.create_task(
                process_onchain_arbs(
                    deque(set(arbs_to_check)),
                )
            )
            last_processed_block = newest_block
            arbs_to_check.clear()
        continue

to this:

# last_processed_block = newest_block

while True:

    try:
        message = json.loads(
            await asyncio.wait_for(
                websocket.recv(),
                timeout=_TIMEOUT,
            )
        )

    # if no event has been received in _TIMEOUT seconds, assume all 
    # events have been received, reduce the list of arbs to check with
    # set(), repackage and send for processing, then clear the 
    # working queue
    except asyncio.exceptions.TimeoutError as e:
        if arbs_to_check:
            if ARB_ONCHAIN_ENABLE:
                asyncio.create_task(
                    process_onchain_arbs(
                        deque(set(arbs_to_check)),
                    )
                )
            # last_processed_block = newest_block
            arbs_to_check.clear()
        continue
    except Exception as e:
        print(f"(watch_events) websocket.recv(): {e}")
        print(type(e))
        break

This no longer tracks the last processed block and will send all arbs for processing even if their events were received late.

Brownie Zombie Middleware Fix

I’ve run into several instances where a bot will throw a strange error:

RuntimeError: cannot call recv while another coroutine is already waiting for the next message when running watch_pending_transactions()

Thanks to some helpful reports on Discord, I’ve discovered the cause.

When Brownie connects to a network, it creates a dedicated web3 object to interact with the RPC. It injects several middlewares into this object, including a special caching middleware that will retrieve and report some values without re-sending calls to the RPC.

This is fine by itself, but it also starts a long-running thread called block_filter_loop that keeps the web3 object “fresh”. This is fine for a local fork, but a big problem for a high-performance bot that is executing concurrent RPC calls using a websocket.

The issue typically occurs when a view call is executed by Brownie when connected to a websocket RPC. It uses websockets under the hood, so the call to recv() sometimes fires at the same time as the thread, which will crash block_filter_loop and throw an uncaught RuntimeError exception.

After this middleware throws the exception, the middleware goes into a zombie state that returns stale values. Bad!

The process cannot easily be killed, so I’ve worked around it by simply replacing the internal brownie.web3 reference with the one we create for the flashbots middleware. Everything works correctly after that point. The old brownie.web3 object is garbage-collected and the thread is killed.

Please note that this only occurs when Brownie is connected to a websocket endpoint. If you are using an http endpoint, you should disregard this.

# Create a reusable web3 object to communicate with the node
# (no arguments to provider will default to localhost on the 
# default port)
w3 = web3.Web3(web3.WebsocketProvider())

try:
    brownie.network.connect(BROWNIE_NETWORK)
except:
    sys.exit(
        "Could not connect! Verify your Brownie network settings using 'brownie networks list'"
    )
else:
    # swap out the brownie web3 object - workaround for the 
    # `block_filter_loop` thread that Brownie starts.
    # It sometimes crashes on concurrent calls to websockets.recv() 
    # and creates a zombie middleware that returns stale state data
    brownie.web3 = w3

Thread Lock Fix

I’ve been working on scaling improvements behind the scenes. One option to improve performance is implementing threading, which has been fairly successful but has exposed some bugs.

This bug should not have affected anyone since the last published version of the bot did not perform any threaded updates, but it’s worth mentioning.

Here is the failure mechanism:

An arb helper starts a long-running arbitrage calculation in a thread, which runs in parallel with the main sync routines of the bot. As part of the arbitrage calculation, the helper needs to fetch a number of new tick words.
During this long-running calculation / fetch, a liquidity change event is emitted by the pool, which is caught by the websocket watcher. The watcher sends an external update comment to the pool, which dutifully updates the tick bitmap and liquidity dictionary.
The tick bitmap and liquidity dictionary is now inconsistent, since liquidity deltas were applied to a data structure in the process of being modified.

I’ve solved the issue by adding an internal lock within each V3LiquidityPool helper. It is accessible as self.lock and the state-modifying methods will hold that lock across the duration of its access.

The syntax is quite simple:

with self.lock:
    [state modifying code here]

If two threads attempt to run this code section, the second thread will blocked until the lock is released by the first thread.

I’ve left some debugging code on github so you’ll see the helpers are a bit more “chatty” than usual. If you find that the following message is printed, please find me on Discord and report it!

"(V3LiquidityPool) {word_position=} inside known range"

Run git pull on the degenbot code base again. The change should be transparent, other than your bot throwing fewer exceptions.

Degen Code

Discussion about this post