The Ethereum Virtual Machine (EVM) is poorly understood. It powers everything we do here, but few know how it actually works.
The Low Level EVM series will explore the structure and operation of the EVM. We need these skills to investigate rival smart contracts, understand the tradeoffs imposed by using various smart contract languages, and make better choices when we write our own smart contracts.
What Is A Virtual Machine?
A virtual machine (VM) is an interface and instruction set for performing operations in software. The VM may be implemented in any language and its instructions may be executed on any hardware — subject to the limits of that hardware with respect to power and storage.
If you’ve programmed in a high level language like Python and Java, you’ve already experienced the benefit of a VM. When the interpreter operates on code written for that language, a set of instructions are compiled to operate a VM. This VM is more closely aligned to the hardware running the code.
Consider this simple Python code:
def greeting_world(greeting: str = 'Hello'):
return f'{greeting}, world!'
When this code is executed, it is translated into the following instructions which are performed by the Python VM:
0 LOAD_FAST 0 (greeting)
2 FORMAT_VALUE 0
4 LOAD_CONST 1 (', world!')
6 BUILD_STRING 2
8 RETURN_VALUE
These results were taken from Compiler Explorer (aka Godbolt). I recommend spending time on Godbolt exploring languages familiar to you, and inspecting how the syntax is translated to lower level instructions.
What Is The Ethereum Virtual Machine?
The EVM, like the Python VM, is an interface and a set of instructions against a specification. “Ethereum” is an overloaded word. People use it for many things without distinction.
I try my best to be specific here, so I refer to the gas token used by the chain as Ether, the computing specification as the Ethereum Virtual Machine, and the combination of the network and the blockchain as Ethereum.
The Ethereum blockchain is an immutable record of state changes since genesis. A node in the network should verify and replay all of the past transactions to arrive at the current global state, and then continue verifying and replaying transactions that are received by the consensus network. In this way, everyone participating in the network has access to the current state.
What Is State, Anyway?
When I refer to state, I mean a complete set of values stored at particular addresses on the distributed storage map.
The state includes, but is not limited to, a mapping of all balances held by all addresses, the instructions recorded at a particular address, and values held in storage at particular locations.
Smart Contracts As Abstractions
High level smart contract languages like Solidity and Vyper abstract away many of the fine details. These languages allow us to build contracts with familiar building blocks: variables, data types, structs, functions, arguments and return values, and control logic statements like if, else, and for.
The Solidity and Vyper compilers translate the specific language syntax into low-level EVM operations, which can then be executed by the EVM and recorded to the blockchain.
But remember that any abstraction is a compromise. With these languages, we give away flexibility for readability and safeguards.
We should all be familiar with reading smart contract source code on Github, Etherscan, and similar. But it’s critical to understand that high level smart contract code is only a representation of what the EVM does. The aphorism “The Map is Not The Territory” applies here.
The EVM does not speak “Solidity” or “Vyper”. You will not find any of those languages on the blockchain. What you will find are bytes!
And when you read The Ethereum Yellow Paper, you will find highly detailed descriptions of how EVM sends, receives, and stores these bytes.
Appendix H of that same paper defines a set of operations which the EVM can execute.
Opcodes
The term “opcode” is shorthand for a byte signifying an EVM operation.
Operations are described in two ways: a hex value and a mnemonic. The hex value is the number recorded in state and executed by the EVM, and the mnemonic is for us to remember what it does.
Using the first opcode as an example, a value of 0x00 will perform a STOP operation. When EVM encounters this instruction, it will halt execution immediately.
Each EVM operation is 1 byte in length, denoted by two hex characters. A byte is 8 bits, so a single byte can express 2^8 different combinations of these bits. Thus, 256 operation codes can be defined.
Opcodes are organized by category, with numbering gaps between the starting values. 256 opcodes are possible, but 151 are defined as of the Shanghai revision. Thus more opcodes can be defined later. A recent example of a new opcode is PUSH0, introduced in Shanghai.
The Stack
The Yellow Paper describes the EVM at a high level. I have highlighted some items relevant to this section:
9.1. Basics. The EVM is a simple stack-based architecture. The word size of the machine (and thus size of stack items) is 256-bit. This was chosen to facilitate the Keccak-256 hash scheme and elliptic-curve computations. The memory model is a simple word-addressed byte array. The stack has a maximum size of 1024. The machine also has an independent storage model; this is similar in concept to the memory but rather than a byte array, it is a word-addressable word array. Unlike memory, which is volatile, storage is non volatile and is maintained as part of the system state. All locations in both storage and memory are well-defined initially as zero.
The machine does not follow the standard von Neumann architecture. Rather than storing program code in generally-accessible memory or storage, it is stored separately in a virtual ROM interactable only through a specialised instruction.
The machine can have exceptional execution for several reasons, including stack underflows and invalid instructions. Like the out-of-gas exception, they do not leave state changes intact. Rather, the machine halts immediately and reports the issue to the execution agent (either the transaction processor or, recursively, the spawning execution environment) which will deal with it separately.
To truly understand the EVM, we need to understand the stack.
What Is A Stack?
At a high level, a stack is last-in first-out (LIFO) data structure which operates on the most recently added item.
I will not review the specific implementation of a stack in memory, because it is not particular useful. Stack implementations vary considerably across different architectures, but the EVM stack is quite simple.
The common analogy used to describe a stack is a flat table where dinner plates can be placed, one above the other. An empty table is akin to an empty stack. Plates can be placed onto the stack one by one, and referenced by their position within the stack.
Stack Operations
The simplest stack implementation defined two operations: push and pop. Pushing places an item on the top of the stack, and popping removes the top item from the stack. EVM does not implement multiple-item pushes or pops, so we will only consider the single-item case.
EVM Stack
The Yellow Paper includes this note before the set of opcodes:
For each instruction, also specified is α, the additional items placed on the stack and δ, the items removed from stack, as defined in section 9.
This sentence tells us that opcodes can pop items from the stack and push items onto the stack. The ordering is not explicit, but opcodes will perform all pop operations first, then perform the instruction, then perform all push operations.
As an example, consider the 0x01 (ADD) opcode.
The ADD operation defined in the Yellow Paper is:
Where:
Wherever you see the mu symbol, that signifies a tuple of values representing the virtual machine state before a given operation. The Yellow Paper defines this:
9.4.1. Machine State. The machine state μ is defined as the tuple (g, pc, m, i, s, o) which are the gas available, the program counter pc ∈ N_256, the memory contents, the active number of words in memory (counting continuously from position 0), the stack contents, and the returndata buffer. The memory contents μm are a series of zeroes of size 2^256.
The subscript s refers to a stack operation. Operations that access or manipulate memory or accounts use the m and a subscripts, respectively.
So the ADD opcode will pop two items from the stack, add them together, and push the result onto the stack.
Arithmetic operations are modulo 2^256, which implies that they will not underflow or overflow the native EVM word size of 32 bytes (256 bites). This may not be what you want! Solidity and Vyper typically perform overflow or underflow checks for these operations, which consume additional gas after the implied EVM operation.
EVM Playground
To visualize this, we can use the evm.codes Playground to input opcodes and step through each operation, visualizing the stack and memory.
Keep reading with a 7-day free trial
Subscribe to Degen Code to keep reading this post and get 7 days of free access to the full post archives.