The predicament of Ethereum - scalability.

With the growing prosperity of the Ethereum ecosystem, one of the long-standing problems surrounding the ecosystem is transaction speed and fees. This has also been an ongoing issue for Ethereum, so scalability is also the most widely discussed issue. Here is a brief introduction to the history.

The Road to Scalability#

Pos: The proposers and validators of blocks are separated, and the workflow of pos is as follows:

Submit transactions in shards.
Validators add transactions to shard blocks.
The beacon chain selects validators to propose new blocks.
The remaining validators form a random committee to validate proposals in the shards.

Both proposing blocks and proving proposals need to be completed within one slot, which is generally 12 seconds. Every 32 slots form an epoch, and each epoch shuffles the order of validators and re-elects the committee.

After the merge, Ethereum will achieve proposer-builder separation for the consensus layer. Vitalik believes that the ultimate goal of all blockchains is to have centralized block production and decentralized block validation. Due to the dense data of Ethereum blocks after sharding, centralization of block production is necessary to meet the high requirements for data availability. At the same time, there must be a way to maintain a decentralized set of validators that can validate blocks and perform data availability sampling.

What is sharding? It is the process of horizontally partitioning a database to distribute the workload.#

Sharding is a way of partitioning that can distribute computational tasks and storage workloads in a P2P network. With this approach, each node does not have to handle the entire network's transaction load but only needs to maintain information related to its partition (or shard). Each shard has its own validator network or node network. The security issue of sharding:

For example, if the entire network has 10 shard chains, it requires 51% of the computing power to disrupt the entire network, so disrupting a single shard only requires 5.1% of the computing power.

The beacon chain is responsible for generating random numbers, assigning nodes to shards, capturing snapshots of individual shards, handling handshakes, equity stakes, and other functions, and coordinating communication between shards to synchronize the network.

A major issue with sharding is cross-shard transactions. Because in sharding, each node group only processes transactions within its own shard, transactions between shards are relatively independent. So how are transfers between users A and B, who are on different shards, handled?

Blocks can be discarded, so if A and B are accepted for processing and W and X are selected in #2, the entire transaction cannot proceed. Although the probability of forking is very small.

The previous approach was to shard the data availability layer, with each shard having independent proposers and committees. In the set of validators, each validator takes turns verifying the data in the shard, downloading all the data for verification.

The disadvantages are:

Requires tight synchronization technology to ensure that validators can synchronize within one slot.
Validators need to collect votes from all committees, which can cause delays.
And validators downloading all the data also puts a lot of pressure on them.

The second method is to give up complete data verification and instead use data availability sampling. There are two types of random sampling methods:

Block random sampling, sampling part of the shards, and if the verification passes, the validators sign it. But the problem here is that there may be missed transactions.
Reinterpret the data as a polynomial using erasure codes, and then use the property of polynomials to recover the data under certain conditions to ensure complete data availability.

The property of polynomials: Data can be recovered from four points.

As long as more than 50% of the encoded data is available, the entire data is available.

When we perform multiple samplings, the probability of data unavailability is only 2^-n.

The logic is that we transform the data into erasure codes and then expand it. The expansion can recover the data.

The problem here is shifted to: whether the expansion is done correctly during polynomial expansion.

If the data itself is problematic, then the data will be incorrectly recovered after expansion. So how to ensure that the data is correctly expanded?

Celestia uses fraud proofs, which have a synchronization issue.
Ethereum and Polygon Avail use KZG commitments, which do not require honest minority and synchronization issues. But KZG commitments do not have the ability to resist quantum computing attacks, and in the future, Ethereum may switch to zk-SNARKs technology, which has resistance to quantum computing attacks.

In this field, the most popular ones are zksync and StarkWare, which use zero-knowledge proofs. They will be discussed in detail later.

What is resistance to quantum attacks: It means that the algorithm does not rely on a large number of mathematical security assumptions.#

KZG commitment: Prove that the value of a polynomial at a specific position is consistent with the specified value.

KZG commitment is just one type of polynomial commitment, which can verify messages without specifying the specific message. The specific process is as shown in the figure:

Compared with Merkle Trees:#

The whole process is: transform the data into erasure codes and expand it. Use KZG commitments to ensure that our expansion is valid and that the original data is valid. Then use the expansion to reconstruct the data and finally perform data availability sampling.

Celestia requires validators to download the entire block, while Danksharding of Ethereum utilizes data availability sampling technology.

Since blocks can be partially available, synchronization needs to be ensured whenever we need to reconstruct the block. When the block is indeed partially available, communication between nodes is used to piece together the block.

Comparison between KZG commitments and fraud proofs:#

It can be seen that KZG commitments can ensure that the expansion and data are correct, while fraud proofs involve a third party for observation. The most obvious difference is that fraud proofs require a time interval for observers to react and report fraud, which requires synchronization between nodes so that the entire network can receive fraud proofs in a timely manner. KZG is obviously faster than fraud proofs, as it uses mathematical methods to ensure the correctness of the data without waiting time.

Celestia's own drawback is the use of large blocks, requiring validators to download all the data, which is also the case with Ethereum's Danksharding proto scheme. To solve the potential problems, Celestia will also choose the method of data availability sampling, which will require the use of KZG commitments.

Both KZG and fraud proofs require synchronization because there is a probability of blocks being unavailable.