Zero-knowledge proof generation speed comparison

The following text is a discussion about the factors related to the TPS (transactions per second) of the zkVM project. The text also compares different aspects related to zero-knowledge proofs.

From the perspective of EVM compatibility, it is necessary to prove the computational complexity.
EVM compatibility (high-level language equivalence, zkVM) > EVM equivalence > ETH equivalence (PSE)
Starknet = Sin7Y > Zksync > PloygonZkevm > Scroll > Ploygon Zero > PSE
The speed of generating proofs can be seen from the perspective of polynomial commitments.
Small field + FRI > FRI > FRI&KZG > KZG (the order of algorithm complexity may vary)
Ploygon Zero > Starknet > Sin7Y > PloygonZkevm > Zksync = Scroll = PSE
The possibility of implementing hardware parallel proofs can be seen from the perspective of algorithm direction.
Single algorithm is better for optimization, and MSM is currently better than FFT.
Pure MSM > Pure FFT > Hybrid
Sin7Y (Pure MSM) > Ploygon Zero (Pure FFT) > Starknet > PloygonZkevm > Zksync > Scroll > PSE
Recursive proof optimization of zero-knowledge proof systems.
Plonky2 > Stark > Plonk = Halo2
Ploygon Zero > Sin7Y > Starknet > PloygonZkevm > Zksync = Scroll = PSE
Composability of zero-knowledge proof systems.
Ploygon Zero = Sin7Y > PloygonZkevm > Starknet > Zksync = Scroll = PSE
Parallelism of the VM itself (transaction parallelism).
OlaVM > zkVM > zkEVM
Sin7Y > Starknet > Zksync > PloygonZkevm > Scroll > Ploygon > PSE

The development language for implementing the VM also has a certain impact on efficiency.

The timing of when the mainnet can run a large number of applications (the sooner, the better).
Zksync > Starknet > PloygonZkevm > Scroll > Sin7y > Ploygon Zero > PSE
The timing of when L3 can run (the sooner, the better).
Starknet > Zksync > PloygonZkevm > Sin7y > Ploygon Zero > Scroll

According to different proof systems, the proof generation process may vary, but the bottlenecks are:

Variable-base and fixed-base multi-scalar multiplication (MSM)
Fast Fourier Transform (FFT) and inverse FFT

In systems where both FFT and MSM coexist, approximately 70% of the time spent generating proofs is spent on MSM, while the rest is dominated by FFT.

Both MSM and FFT are slow, but there are methods to improve their performance:

MSM has embarrassingly parallelism, which can be accelerated by running them on multiple threads. However, even on hundreds of cores, multiplication still takes a lot of time. This means that the same operations are frequently repeated and will consume most of the available memory on the device. In short, MSM requires a large amount of memory and is still slow even when highly parallelized.
FFT heavily relies on frequent shuffling of data during runtime. Additionally, they require a large amount of bandwidth when running on hardware. Shuffling means you need to load and unload elements "randomly," such as a >100GB dataset on a hardware chip with 16GB or less memory. While operations on hardware are very fast, the time to load and unload data over the network eventually significantly slows down the operation speed.

In summary:

MSM has predictable memory access, allowing for a large degree of parallelization, but their cost is still high due to the required computational complexity and memory.
FFT has random memory access, making it unfriendly to hardware and naturally difficult to run on distributed infrastructure.

In addressing the slow performance of large-scale MSM and FFT, the most promising work we see is PipeZK. In their paper, the authors describe a method to make MSM cheaper by skipping redundant calculations using the Pippenger algorithm.

They also describe an "unrolled" FFT method, which allows them to be executed without significant shuffling, resulting in improved speed due to the now predictable memory access pattern.

Recently, the Plonk family proposed a new solution that is more suitable for parallelization after optimizing the MSM direction.

Parallelization

Rust VM

Cairo 1.0

Regenesis