Using Process Pools For CPU-Bound Work

Mo' Core Fo' Sho

May 06, 2023

∙ Paid

There comes a time in every Python developer’s life when he must accept responsibility for his actions. That time usually coincides with the first attempt at making some code fast (or at least not dog-shit slow).

It’s fairly simple to speed up Python code that is I/O-bound, that is, code that spends a long time waiting for some external call to complete. Common examples are reading/writing a file, sending an HTTP request, or broadcasting a blockchain call via web3py.

The first step for many is the asyncio module, which I use heavily to execute sections of code concurrently. Much of the work of a bot is sending and receiving small packets of information from an RPC, then making choices about how to use it. For this relatively light work, asyncio works just fine. HTTP requests can be managed using aiohttp, websocket requests can be managed using websockets, and even web3 calls can be managed using the web3py AsyncHTTPProvider.

For random tasks that you need to do, but don’t have an asyncio-native module, you can generally rely on the ThreadPoolExecutor from concurrent.futures, which allows you to run normally blocking code in a separate thread. Python threads are light and easy to manage, and you can kludge together a lot of stuff this way as long as you properly guard against race conditions and shared access violations.

But sooner or later you’ll have to deal with some code that does some CPU-heavy work. You might think “oh no big deal I’ll just throw this into a thread and be on my way”. But then you discover that the code is still slow… what gives?!

The Good, The Bad, and the GIL

In Python circles you see a lot of talk about the GIL. What is it? I won’t go on at length about this, but here’s what you need to know: the Global Interpreter Lock (GIL) is a synchronization mechanism that restricts the Python interpreter from performing work in multiple threads at once. That’s fine for I/O-bound work because there’s nothing productive that you can do while waiting for an HTTP request to complete. If a thread is paused, then resumed later, the HTTP data can be retrieved from the OS layer to be processed. No big deal, almost transparent. If you have ten threads sending and receiving data in small chunks, you can run them in near-parallel and never notice. The GIL is active the whole time, so you’ll find that you cannot simultaneously send and receive between two separate tasks, but it’s close enough that it hardly matters.

Where this starts to hurt is when you perform CPU-bound work that “blocks”. Blocking means simply that when a particular code section begins to execute, it will continue executing to the exclusion of any other work controlled by the GIL until it finishes.

So this is why sending heavy CPU-bound tasks into a thread pool does nothing for you, since a thread is still restricted to one-at-a-time execution by the GIL.

There are some exceptions to this, particularly for modules that offload processing to a pre-compiled C program. A common example is Numpy, which releases the GIL after you call any C method, and then re-acquires it when that work is done. If you can find modules that release the GIL in this way, you can use threads to get performance and near-concurrent execution.

SciPy — My Favorite Source of CPU-Bound Blocking Work

You should be familiar with SciPy from my early exploration of arbitrage amount optimization. I use the minimize_scalar method as a way to identify the optimal swap amounts for various arbitrage paths.

The downside to minimize_scalar is that the CPU-bound nature of the work within the arbitrage helper class does not release the GIL, and therefore does not benefit from being offloaded to a thread pool.

This is keenly felt if you’re trying to analyze a large number of arbitrage paths. Without the ability to parallelize the optimization calculation, your bot will fall behind. The examples I’ve shared so far largely concentrate on 2-pool arbitrage for performance reasons. Once you introduce 3-pool arbitrage the number of paths increase exponentially. My Arbitrum bot tracks roughly 40,000 paths, and my Ethereum bot tracks roughly 200,000 paths.

This can very quickly bog down, and leads to an unresponsive bot that cannot keep pace and identify opportunities quickly enough to execute on them.

What Now?

There is another type of executor provided by concurrent.futures, the ProcessPoolExecutor. You can interact with it just like ThreadPoolExecutor. The key difference is that ProcessPoolExecutor will start new processes for its workers instead of new threads.

Why bother? Because the GIL is per-process, not per-thread. If you start two Python programs in separate terminals, neither will conflict because each process has a dedicated GIL. You can perform demanding work across processes, but the downside is that the work cannot easily be coordinated. You could use a pub/sub mechanism, send data across a pipe, through a file, or a socket, etc., but wouldn’t it be easier if related stuff could run from a single Python app and spin up processes as needed? Well, yes! And that’s exactly what ProcessPoolExecutor does.

Process Pool Executor — Under The Hood

How does it work anyway? If you spend time reading through the documentation you’ll discover that the executor works by pickling data, sending it across a pipe to a new process, which unpickles and executes that work before sending it back through that same pipe.

Processes cannot easily access shared data, and unpickled objects will have different memory addresses when they are created on the other side of the pipe.

There is a lot to manage here, but since we care about performance we will do the work.

The first requirement is the ability to pickle the data required to do the work. I first explored pickling as a means to improve bot startup speed. Even though I went with tried-and-true JSON in the end, those pickling lessons will be valuable.

Pickleable Or Nah?

Before any process pool work can be done, we need to answer a key question: can an arbitrage helper be pickled?

Let’s find out by firing up a Brownie console, creating a simple arbitrage helper, and trying to pickle it:

(.venv) [devil@dev tmp]$ brownie console --network ankr-mainnet
Brownie v1.19.3 - Python development framework for Ethereum

No project was loaded.
Brownie environment is ready.
>>> import degenbot as bot
>>> import pickle
>>> arb_helper = bot.UniswapLpCycle.from_addresses(
  input_token_address='0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2',      
  swap_pool_addresses=[
    ("0xCEfF51756c56CeFFCA006cD410B03FFC46dd3a58","V2"), 
    ("0xCBCdF9626bC03E24f779434178A73a0B4bad62eD","V3"),
  ]
)

• WETH (Wrapped Ether)
• WBTC (Wrapped BTC)
• WETH (Wrapped Ether)
WBTC-WETH (V2, 0.30%)
• Token 0: WBTC - Reserves: 39046575994
• Token 1: WETH - Reserves: 5827611710482111395559
WBTC-WETH (V3, 0.30%)
• Token 0: WBTC
• Token 1: WETH
• Liquidity: 2115274884382260275
• SqrtPrice: 30567826468293261990949966114102410
• Tick: 257275
/home/devil/tmp/degenbot/arbitrage/uniswap_lp_cycle.py:37: UserWarning: No maximum input provided, setting to 100 WETH
  warn("No maximum input provided, setting to 100 WETH")

So does it pickle?

>>> pickle.dumps(arb_helper)
  File "<console>", line 1, in <module>
PicklingError: Can't pickle <class 'web3._utils.datatypes.Approval'>: attribute lookup Approval on web3._utils.datatypes failed

Of course not! The error isn’t particular helpful either.

We won’t go down the rabbit hole completely, since I’ve already done that and have returned with answers. When an object is pickled, every part of it gets converted into bytecode. In the case of an arbitrage helper, that means all of functions and attributes. If any of the attributes are references to other objects, those objects get pickled. An arbitrage helper holds references to the input token object and the pool objects along the swap path.

The pickling process must be complete, and any object that cannot be pickled will throw that exception.

What kinds of objects cannot be pickled? These are mostly objects that maintain some state that references an external system. Examples are an HTTP session, file handle, web3 connection to an RPC, etc.

In the case of degenbot helpers, the offenders are typically the Brownie contract objects which maintain a connection to the Brownie chain object which is used to reach the RPC via the web3py library.

So once again, what do we do?

The key lies in how the object will be used after it is unpickled on the other side of the process pool. In our case, we intend to call the calculate_arbitrage method (which uses SciPy) and inspect the results of the optimization.

Since the arbitrage helper object itself only needs to reference its internal state and the internal state of the pools, and does not need to perform on-chain lookups, we can simply discard the non-pickleable objects.

We can implement custom pickle/un-pickle operations inside a class via two special methods:

__getstate__ which will return some state value
__setstate__which will accept that value and use it to recreate the internal state of the new object

Where should the state value come from? The object itself!

Every object in Python, unless it has been constructed in a special way, stores its internal state in a dictionary called __dict__. Let’s look at one now:

>>> arb_helper.__dict__
{
    'best': {
        'input_token': WETH,
        'last_swap_amount': 0,
        'profit_amount': 0,
        'profit_token': WETH,
        'strategy': "cycle",
        'swap_amount': 0,
        'swap_pool_addresses': ["0xCEfF51756c56CeFFCA006cD410B03FFC46dd3a58", "0xCBCdF9626bC03E24f779434178A73a0B4bad62eD"],
        'swap_pool_amounts': [],
        'swap_pool_tokens': [[<degenbot.token.Erc20Token object at 0x7fc9e6b8f070>, <degenbot.token.Erc20Token object at 0x7fc9ec7a8a30>], [<degenbot.token.Erc20Token object at 0x7fc9e6b8f070>, <degenbot.token.Erc20Token object at 0x7fc9ec7a8a30>]],
        'swap_pools': [WBTC-WETH (V2, 0.30%), WBTC-WETH (V3, 0.30%)]
    },
    'gas_estimate': 0,
    'id': None,
    'input_token': WETH,
    'max_input': 100000000000000000000,
    'name': "WBTC-WETH (V2, 0.30%) -> WBTC-WETH (V3, 0.30%)",
    'pool_states': {
        '0xCBCdF9626bC03E24f779434178A73a0B4bad62eD': None,
        '0xCEfF51756c56CeFFCA006cD410B03FFC46dd3a58': None
    },
    'swap_pool_addresses': ["0xCEfF51756c56CeFFCA006cD410B03FFC46dd3a58", "0xCBCdF9626bC03E24f779434178A73a0B4bad62eD"],
    'swap_pool_tokens': [[<degenbot.token.Erc20Token object at 0x7fc9e6b8f070>, <degenbot.token.Erc20Token object at 0x7fc9ec7a8a30>], [<degenbot.token.Erc20Token object at 0x7fc9e6b8f070>, <degenbot.token.Erc20Token object at 0x7fc9ec7a8a30>]],
    'swap_pools': [WBTC-WETH (V2, 0.30%), WBTC-WETH (V3, 0.30%)],
    'swap_vectors': [
        {
            'token_in': WETH,
            'token_out': WBTC,
            'zeroForOne': False
        },
        {
            'token_in': WBTC,
            'token_out': WETH,
            'zeroForOne': True
        }
    ]
}

If you review the source for the UniswapLpCycle helper on github, you’ll see familiar attribute names. You’ll find that inside the dictionary, keys correspond to attribute names. Here’s an example showing that the keys and values are an exact match for the attribute names and the values stored there:

>>> arb_helper.__dict__['best'] is arb_helper.best
True

If we removed the dict entirely, would the object pickle?

>>> del arb_helper.__dict__
>>> pickle.dumps(arb_helper)
b'\x80\x04\x95=\x00\x00\x00\x00\x00\x00\x00\x8c#degenbot.arbitrage.uniswap_lp_cycle\x94\x8c\x0eUniswapLpCycle\x94\x93\x94)\x81\x94.'

It does! But that’s not a very useful object. So our task is to determine which attributes are necessary to perform the calculation after unpickling.

Degen Code