The “worst” part about making efficiency improvements to the bot is that I’m able to monitor and chase more opportunities. That seems all fun and lighthearted, but it becomes a real problem at the data entry level.

Take the Snowsight backrun bot as an example, where to accomplish a very simple two/three-pool arbitrage with pending transactions, I needed to define:

Three token addresses and helper objects

Seven different LP addresses and helper objects

Two future LP helpers

Six arbitrage helpers

First world problems I know, but all that repetitive data entry gets very tedious. I have several `assert`

statements built into my helpers to detect errors, but this is not a recipe for long-term success. It also creates a rate-limited bottleneck where all tokens and arbitrage pathways need to be manually entered by hand. Relying on my dexterity, free time, and creativity is a losing plan. The smart contract and bot are built to work on arbitrary tokens, so why am I using this powerful tool in such a limited way?

The next series of posts will cover my ongoing optimization efforts at the “opportunity searching” level. Identifying arbitrage pathways by searching manually on the TraderJoe/Pangolin/Sushiswap analytics webpages is a terrible way to go. Much better to apply some heuristics and data-sorting tools using Python.

Before proceeding, I recommend reviewing the lesson on Multicall Requests. The aim of that post was to reduce the number of RPC calls it took to retrieve a huge chunk of data from various factory contracts. At the end of the lesson, you will be the proud owner of three comma-separated-value files with all addresses for each LP contract and associated tokens for every liquidity pool on TraderJoe, Pangolin, and Sushiswap.

## Too Much Data, Too Little Brain

**Problem Statement:** I have address data for over 10,000 DEX LPs and need to process them and identify potential arbitrage pathways.

**Solution**: Use an algorithm!

**New Problem**: … OK which one?

The odds of me implementing an efficient algorithm to sort through data like this is virtually zero, so I spent several weeks reading about pathfinding algorithms. It turns out that **Graph Theory** is a perfect field of study for this tricky problem.

You can read all about Graph Theory on Wikipedia, but here is the short version:

A graph is a data structure consisting of objects and the relationship between those objects. A node (also called a vertex or point) is the atomic “piece” of the graph. All nodes that have a relationship with another node are connected by means of an edge (also called a link) that may contain an arbitrary amount of extra information.

Using the above example, imagine that each node represents a particular ERC-20 token. ERC-20 tokens are unique, like each node in a graph. For every two tokens that share a liquidity pool, draw an edge. From there, move around the graph (called “traversing”) to discover possible arbitrage paths between tokens by keeping track of which edges you have visited.

This will be a really clumsy example just to fit the graph above, but let’s imagine the following tokens represented by those 6 nodes:

MIM

WAVAX

USDC

SPELL

USDT

sSPELL

WAVAX is a very common token, so it has three neighbors (nodes reachable by a shared edge). There are liquidity pools for WAVAX-MIM, WAVAX-USDC, and WAVAX-USDT.

MIM, USDC, USDT, and SPELL are common as well, but less so than WAVAX. Each has two neighbors: MIM-WAVAX & MIM-USDT; USDC-WAVAX & USDC-SPELL; USDT-MIM & USDT-SPELL.

Reader favorite SPELL is also very popular with three neighbors: SPELL-sSPELL, SPELL-USDT, and SPELL-USDC.

If you were going to dream of arbitrage pathways, you would look for “paths” within the graph that started on a token that you held (starting node) and ended on a token you wanted (ending node) or returned to the same node.

Let’s visualize a simple triangular arbitrage: start with some quantity of WAVAX (node #2), swap WAVAX for MIM (node #1), swap MIM for USDT (node #5), then swap MIM back to WAVAX (returning to node #2).

Ideally I’ve made some money doing this, but that’s not why I’m building the graph. I want to use it to identify all possible pathways, then let my bot determine whether that pathway is profitable. Recall that I’ve already solved that problem using SciPy.

## NetworkX

Enter the humble NetworkX Python module, a set of functions and algorithms designed to make analyzing graph simple.

I am going to present examples of how to use NetworkX to build and pull data from token/LP graph, then design a way to use that CSV data to identify real arbitrage pathways on Avalanche.