Numerical Optimization — Part III: Derivative-Free Methods (Multivariate)
More Pools (Maybe) More Profits
Part II covered derivative-free methods for the simple arbitrage problem: two pools holding the same token pair, optimizing for maximum profit from a single trade.
Now we’ll turn our attention to a complex arbitrage problem: multiple pools holding an arbitrary set of token pairs, optimizing for a set of trades to maximize profit.
Scalar vs. Vector
In any of these optimization problems, we are concerned with determining a value for the decision variable that maximizes or minimizes some constraint.
For the simple arbitrage problem, the decision variable is the optimum starting trade (which is then chained through the fixed pool path) to maximize net profit. This one-dimensional value is also called a scalar. The SciPy function is named minimize_scalar, which gives us a strong hint about what kinds of problems it can solve.
For the complex arbitrage problem, the decision variable is a multi-dimensional vector that represents inputs to different pool paths.
These problems cannot be evaluated by the scalar methods, so we must use an appropriate multivariate method.
The Parallel Arbitrage Problem
Consider that a searcher has a set of arbitrage paths and wants to maximize profitable trades through them.
The decision variable x is a vector of amounts allocated to each path:
The net profit is the sum of the net change in x across each path:
We can impose additional constraints to clamp the solution to feasible values:
The net profit must be positive
The maximum total trade is bounded by some value
The paths are independent (each pool is used only once)
Each trade satisfies the pool/protocol-specific constraints
But since we’re using the “black box” method of calling calculation functions on our offline simulator, we can defer enforcement of the pool/protocol-specific constraints.
So the major constraints are setting the maximum size and constructing a set of independent paths.
Methods
For simplicity, we’ll evaluate a subset of methods built into SciPy version 1.17.1.
Using Kimi K2.6 as the AI model, I started with a request to prepare a comparison of the available methods and their suitability for our complex arbitrage problem:
Nelder-Mead
Mechanism: Maintains a simplex of n+1 points in the search space. Iteratively reflects, expands, contracts, and shrinks the simplex based on relative function values at the vertices.
Properties:
Purely derivative-free; no gradient estimation.
Supports box bounds.
Does not support general inequality or equality constraints.
Deterministic given a starting point.
Well-suited to low-dimensional problems (n ≤ 20) and noisy objectives.
Known to stall on problems with narrow curving valleys.
Constraint handling for capital budget: The sum constraint cannot be enforced natively. Possible workarounds:
Repair method: After each simplex vertex update, project onto the budget hyperplane by proportional scaling.
Penalty method: Add λ⋅max(0,∑xi−C)^2 to the objective.
Expected behavior on arbitrage problems: Good for small n (2–5 paths) where the landscape is relatively smooth. Will struggle if paths have strongly differing scales (e.g., USDC-denominated inputs from 10^ to 10^9).
Powell
Mechanism: A conjugate-direction method. Performs sequential 1-D minimizations along a set of search directions, then updates the direction set based on the progress made. Avoids explicit gradients.
Properties:
Derivative-free; no gradient or Hessian required.
Supports box bounds; evaluates outside bounds during the first iteration if the initial guess is outside, but respects bounds thereafter when provided.
Does not support general constraints.
Often faster than Nelder-Mead on smooth or approximately quadratic objectives.
Can accelerate along ridge-like structures in the objective surface.
Constraint handling for capital budget: Same as Nelder-Mead — repair or penalty required.
Expected behavior on arbitrage problems: Likely more efficient than Nelder-Mead because the arbitrage profit landscape tends to be smooth (swap output is a continuous function of input). The conjugate-direction updates should exploit the near-quadratic structure around the peak.
COBYQA
Mechanism: Constrained Optimization BY Quadratic Approximations. A derivative-free trust-region SQP method. At each iteration, it builds quadratic models of the objective and each constraint from function evaluations, then solves a trust-region subproblem.
Properties:
Derivative-free; constructs quadratic surrogate models.
Supports box bounds (treated as unrelaxable — always respected).
Supports general nonlinear inequality and equality constraints (also treated as unrelaxable).
Trust-region radius adapts automatically.
Stronger convergence guarantees than Nelder-Mead or Powell for smooth objectives.
Constraint handling for capital budget: The linear budget constraint ∑xi ≤ C can be passed directly as a
NonlinearConstraintorLinearConstraint. This is a significant advantage — no repair or penalty tuning needed.Expected behavior on arbitrage problems: Best suited when the objective is smooth and constraints must be respected exactly. The overhead of quadratic model building is justified if evaluations are cheap (which they are for offline pool simulation). Should outperform Nelder-Mead and Powell on constrained problems with n ≥ 3.
COBYLA
Mechanism: Constrained Optimization BY Linear Approximation. Uses linear (affine) models of the objective and constraints, updated iteratively.
Properties:
Derivative-free; linear models are cheaper to build than quadratic.
Supports general nonlinear inequality constraints (equality constraints are internally converted to pairs of inequalities).
Supports box bounds via inequality constraints.
Simpler than COBYQA but less accurate for highly curved objectives.
Constraint handling for capital budget: Pass as a linear inequality constraint directly.
Expected behavior on arbitrage problems: A lighter-weight alternative to COBYQA. May be faster per iteration but require more iterations if the profit landscape has significant curvature (which it does, due to AMM slippage).
Differential Evolution
Mechanism: A stochastic, population-based evolutionary algorithm. Maintains a population of candidate vectors. Each generation, every candidate is mutated (by adding scaled differences of randomly selected population members) and recombined with the original to form a trial vector. The better of the two survives.
Properties:
Global optimizer; designed to escape local minima.
Derivative-free; uses only function value rankings.
Requires bounds for all variables (mandatory).
Supports linear and nonlinear constraints (inequality and equality) via the Lampinen approach.
Supports parallel evaluation via
workers=-1or a custom map-like callable.Supports vectorized evaluation via
vectorized=True(passes all trial vectors at once).Stochastic; requires a random seed for reproducibility.
Key parameters:
popsize: Multiplier for population size (pop=popsize×n).
mutation(F): Differential weight, typically [0.5,1].
recombination(CR): Crossover probability, typically 0.7.
strategy:'best1bin'is the default;'randtobest1bin'often explores more aggressively.
polish: IfTrue, runs L-BFGS-B from the best DE solution at the end. Can be disabled for pure derivative-free results.Constraint handling for capital budget: Pass as a
LinearConstraintorNonlinearConstraint. The Lampinen constraint handling compares constraint violation along with objective value during selection.Expected behavior on arbitrage problems: Because evaluations are cheap, DE’s large population size is not a bottleneck. The
vectorized=Trueoption is particularly attractive: we can evaluate all trial vectors in a single batch call to our pool simulation infrastructure. DE should find the global optimum reliably but may converge more slowly than local methods once near the peak. Best used when n is moderate (n ≤ 20) and there is concern about local optima (e.g., noisy or multi-modal profit landscapes from interacting pools).
Test & Benchmark
Then I asked the model to create a test harness to compare the methods, solve them and compare accuracy.
It created four scenarios and described them:


