πŸ’€Example Vulnerability

In the preceding section, we examined potential vulnerabilities that might emerge due to incorrect or suboptimal implementation of Merkle-Patricia Trees (MPTs) in blockchain systems. While the underlying theory of MPTs is robust, there have been practical scenarios where these vulnerabilities have manifested, leading to security concerns. In this section, we will delve into real-life situations where vulnerabilities arose owing to improper use or flawed implementation of MPTs. This case study will highlight the criticality of secure design principles when employing MPTs and shed light on common traps that developers should remain vigilant of.

Example 1: Polygon Double-Spend Bugfix - $2M Bounty Payout

There was around ~$850M at risk. Having just $100k to launch the attack with would result in $22.3M in losses

background

As blockchain technology evolves, more use-cases are being developed, with Decentralized Finance (DeFi) leading the way. Various assets, including tokens and Non-Fungible Tokens (NFTs), are created on leading blockchains such as Ethereum, Binance Smart Chain (BSC), Polygon (formerly Matic), and Bitcoin. However, a prominent challenge has been to facilitate the seamless transfer of these assets across different blockchains.

To address this, the concept of a blockchain bridge was introduced. A blockchain bridge acts as a connector between two separate blockchains, allowing assets and data to move between them. This technology is pivotal in promoting interoperability in the blockchain ecosystem, which is often siloed due to different chains' unique protocols.

Polygon, an Ethereum-compatible blockchain network designed to facilitate faster and cheaper transactions, is one such platform that offers a blockchain bridge. It offers a trustless, two-way transaction channel between Polygon and Ethereum and has introduced the cross-chain bridge with Plasma and Proof of Stake (PoS) security mechanisms. It's worth noting that while blockchain bridges are a powerful tool for enabling interoperability, they also introduce potential vulnerabilities that can be exploited if not properly implemented or secured.

High-level view of Asset Movement Overview Using the Plasma Bridge

  1. The process starts with a user depositing tokens into a designated Polygon contract residing on the root chain, which in this case is Ethereum.

  2. Upon successful confirmation of this deposit transaction on the Ethereum network, a corresponding quantity of tokens is minted on the child chain, Polygon. These minted tokens are immediately ready for use within the Polygon network.

  3. At the point when a user decides to withdraw their assets from the Polygon network (child chain), the withdrawal process is initiated from the Polygon side.

    3.1. For the withdrawal to be processed, a checkpoint interval must pass (typically around 30 minutes). This interval ensures that all blocks have been validated since the last recorded checkpoint.

    3.2. Once validated, the checkpoint is submitted to the root chain contract residing on the Ethereum network.

  4. As part of the withdrawal process, an EXIT Non-Fungible Token (NFT) is minted, representing the exact value of assets the user wishes to withdraw.

  5. Following this, a mandatory waiting period begins, during which the user must wait for seven days before they can successfully withdraw their assets.

  6. After the waiting period concludes, the user can claim their funds back to their Ethereum account using the 'process-exit' procedure.

It is important to note that a vulnerability was identified in the withdrawal process, which we will delve into in the subsequent sections.

Asset Withdrawal Process

The asset withdrawal operation commences with the burning of tokens on the child chain, or the Polygon network. The Polygon Plasma client provides a method known as 'startWithdraw' that activates the 'withdraw' function of the 'getERC20TokenContract'. This function is responsible for the token burn.

Once the burn is confirmed, the user can trigger the 'startExitWithBurntTokens' function of the 'erc20Predicate' contract. This stage is when the initial checkpoint, approximately 30 minutes, is observed. Further, the exit payload, containing all crucial data regarding the funds transferring from Layer-2 (L2) to Layer-1 (L1), must be passed to the function.

To advance with the exit, the burn transaction must be successful and valid. A crucial aspect to note here is that the exit operation can only be initiated after the checkpoint, inclusive of the burn transaction, has been incorporated into the root chain. The user should then activate the 'processExits' function of the 'withdrawManager' contract and submit the burn proof.

The primary vulnerability resides in how Polygon's 'WithdrawManager' validates the inclusion and uniqueness of the burn transaction in preceding blocks.

Vulnerability

The WithdrawManager.sol implements verifyInclusion() function.

The objective of the verifyInclusion function, as its name indicates, is to affirm the inclusion of the burn transaction receipt during a checkpoint. It achieves this by examining the Merkle Proof for the receipt and the transaction itself. All of this information is incorporated into the exit payload. For an in-depth look at the exact contents of the exit payload, one can refer to the provided documentation.

A key parameter that the exit proof contains is the branchMask for the Merkle proof's receipt. The branchMask is a critical security feature that aids in maintaining system integrity. Therefore, it's essential that the branchMask be unique, as it generates the Exit ID. A crucial property that must be upheld is one exiting transaction equates to one Exit ID. However, as discovered by the whitehat, this may not always hold true.

The branchMask undergoes HP encoding and is later decoded in the MerklePatriciaProof.verify call within WithdrawManager.sol.

       require(
            MerklePatriciaProof.verify(
                payload.getReceipt().toBytes(),
                vars.branchMaskBytes,
                payload.getReceiptProof(),
                vars.receiptRoot
            ),
            "INVALID_RECEIPT_MERKLE_PROOF"
        );

The verify function decodes the encoded path by activating the _getNibbleArray function.

    function _getNibbleArray(bytes memory b)
        private
        pure
        returns (bytes memory)
    {
        bytes memory nibbles;
        if (b.length > 0) {
            uint8 offset;
            uint8 hpNibble = uint8(_getNthNibbleOfBytes(0, b));
            if (hpNibble == 1 || hpNibble == 3) {
                nibbles = new bytes(b.length * 2 - 1);
                bytes1 oddNibble = _getNthNibbleOfBytes(1, b);
                nibbles[0] = oddNibble;
                offset = 1;
            } else {
                nibbles = new bytes(b.length * 2 - 2);
                offset = 0;
            }

            for (uint256 i = offset; i < nibbles.length; i++) {
                nibbles[i] = _getNthNibbleOfBytes(i - offset + 2, b);
            }
        }
        return nibbles;
    }

Apart from the decoding in MerklePatriciaProof, the WithdrawManager.verifyInclusion function also decodes the path as a uint256.

        vars.branchMask = payload.getBranchMaskAsUint();
        require(
            vars.branchMask & 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF00000000 == 0,
            "Branch mask should be 32 bits"
        );

Since the decoding into an array of nibbles neglects part of the value, and variations in the neglected portion are not rejected by the uint256 decoding, the same value decoded by MerklePatriciaProof may have multiple uint256 encodings. As the uint256 decoding is used to prevent replays, the same proof can be replayed due to differences in decoding.

To comprehend why one semantic value may have multiple encodings, we need to delve deeper into the decoding process in MerklePatriciaProof. We can see here that if the first nibble of the HP-encoded value is 1 or 3, the second nibble is interpreted.If the first nibble is neither 1 nor 3, the entire first byte is discarded.

  function _getNibbleArray(bytes memory b)
        private
        pure
        returns (bytes memory)
    {
        bytes memory nibbles;
        if (b.length > 0) {
            uint8 offset;
            uint8 hpNibble = uint8(_getNthNibbleOfBytes(0, b));
            if (hpNibble == 1 || hpNibble == 3) {
                nibbles = new bytes(b.length * 2 - 1);
                bytes1 oddNibble = _getNthNibbleOfBytes(1, b);
                nibbles[0] = oddNibble;
                offset = 1;
            } else {
                nibbles = new bytes(b.length * 2 - 2);
                offset = 0;
            }

            for (uint256 i = offset; i < nibbles.length; i++) {
                nibbles[i] = _getNthNibbleOfBytes(i - offset + 2, b);
            }
        }
        return nibbles;
    }

Excluding instances where the second nibble is interpreted, there are 14*16, or 224, ways to encode the same path. A malicious user could potentially create diverse exit IDs for a single exit transaction.

What would the sequence of an exploit look like?

  1. Deposit a significant amount of ETH/tokens into Polygon via the Plasma Bridge.

  2. Once the funds are confirmed as available on Polygon, initiate the withdrawal process.

  3. Wait for the seven-day period for an exit to be valid.

  4. Resubmit the exit payload, altering the first byte of the branch mask.

  5. The same valid transaction can be resubmitted up to 223 times, using different values for the first byte of the HP-encoded path.

  6. Profit

As mentioned, the implications of this vulnerability were vast. At the time the bug was submitted, approximately $850M resided within the DepositManagerProxy. So, how did the team and the whitehat address this issue?

Summary

The key issue with this vulnerability is that, ideally, there should be exactly one unique Exit ID for each exit transaction, which in turn should uniquely identify each withdrawal request. This Exit ID is generated using several components, including the branch mask, and is used to keep track of withdrawal requests and prevent any form of replay.

However, due to the discrepancy in how the branch mask is decoded in two different places in the code (as a uint256 in WithdrawManager.verifyInclusion and as a nibble array in MerklePatriciaProof.verify), it's possible for an attacker to create multiple distinct Exit IDs for the same exit transaction.

The attacker can do this by altering the first byte of the HP-encoded branchMask in different exit payloads for the same exit transaction. Because of the discrepancy in decoding methods, these different branchMasks would still be considered valid and would result in multiple valid Exit IDs for the same transaction. This means the system would treat them as separate withdrawal requests, which is where the vulnerability lies.

Vulnerability Fix

As it turns out, the first byte of the encoded branch mask is supposed to always be 0x00. The fix is to check if the first byte of the encoded branch mask is 0x00 and not to disregard it as an incorrect mask.

You can find the commit with the fix here: https://github.com/maticnetwork/contracts/commit/283b8d2c1a9ff3dc88538820ffc4ea6a2459c040

New implementation of WithdrawManager: https://etherscan.io/address/0x4ef5123a30e4cfec02b3e2f5ce97f1328b29f7de#code

Last updated