What is Sharding? Explained

The dictionary meaning of shard is a small part of something. Sharding is dividing a big database into smaller portions so it can be easily managed and updated. It was first introduced in MMORPG or Massively Multiplayer Online Role-Play Games in 1999. Since these games had massive traffic, the huge database was divided into segments, portraying different scenes or landscapes to the players. This was done to manage the heavy traffic all at once by splitting the players across different servers.

The basic purpose behind this is to be able to manage a vast database. The transactions per unit of time cause the size of a database to increase linearly and the response time to increase exponentially. More response time means slow output, which can only be dealt with with the help of more expensive hardware.

All data-driven applications and websites growing significantly will need to scale. The huge amount of data will need to be securely accommodated to access easily. This scaling has to be dynamic to keep up with the fluctuations in the future.

Therefore, we can say that the cost of maintaining such a large database in one place becomes higher. On the other hand, if sharding is performed, the database can be divided into smaller segments which can be managed individually from several places with much less expensive hardware. A common example in business can be creating data shards of a customer database per geographic location. This will make customers' data in one location be put together on a unique server. So, instead of going through a huge customer database for certain information, smaller database segments will have to be processed.

Sharding in terms of Blockchain

Sharding, in a database, has been used for a long time. It is quite simple as the developers must create a separate database structure that can operate securely for their given use case. In a blockchain, however, sharding becomes a little complicated.

In a blockchain network, the data is a block, and each block contains its hash and the previous block's hash. So, each block is related to the previous block. It is also considered a database with nodes representing the data servers. The essence of blockchain lies in the fact that it is decentralized. Taking any measure to compromise decentralization will result in weaker security. Then why perform sharding of blockchains at all?

Blockchain networks face a serious and recurring issue called bloat. Bloat is the challenge of storing massive amounts of data or blocks permanently on the chain. In other words, a blockchain has to be scalable to accommodate all this data, which is increasing by the second. Now, this causes a problem related to scalability and response time.

The idea of sharding in blockchains is to overcome these issues. This can potentially be achieved by separating it into small manageable segments. Sharding applied to the blockchain will cause the network to separate into individual shards, each containing a unique set of account balances and smart contracts. Nodes then assigned to individual shards will verify operations and transactions. These nodes are now only responsible for their transactions. This is better than each one verifying all transactions on the entire network.

This will divide a larger blockchain into smaller segments and increase the speed because one node won’t have to verify all transactions in the network; there will be designated nodes for each verification.

For example, the bitcoin blockchain could initially only perform 3-7 tractions per second (TPS), and the Ethereum blockchain could handle 12-30 TPS. If we compare these with VISA’s speed, which is 24,000 TPS, we realize that there is a massive difference. Ethereum is said to have over 8000 computers in the network, each lending a certain hash power to the network.

We can infer that increasing the number of computers won’t necessarily increase the processing speed. The whole register is kept in each computer, which causes the verification process to be a lot slower. Instead of linear execution, the process of parallel execution can be way more beneficial in this respect. Multiple computers will be performing only the designated computations parallelly. This will allow multiple transactions to be processed at the same time.

Challenges

A few challenges are faced because this technology is still somewhat in the developmental stage. Developers are making changes to get the best of this technology. Some of the major challenges are mentioned below.

First and foremost, sharding a blockchain is very risky. All blocks in a blockchain are linked to the block preceding them and include their hashes. So how a blockchain should be sharded depends on its underlying consensus mechanism. A Proof-Of-Work blockchain, for example, is very hard to shard. The transactions have to be validated, but their entire transaction history isn’t available. Hence, new transactions will have to be validated without any knowledge of history.

Secondly, shards are often exposed to security threats. If a hacker can take over most blocks, then these blocks can easily be manipulated in many ways.

Lack of communication across different shards is another major problem faced. There has to be a mechanism to help establish communication, but this adds a separate layer of complication for the developers.

Troubles with Sharding

Communication and security are the two sectors where sharding is at a disadvantage. When a blockchain shard is created, each shard behaves as an individual blockchain network. All these individual networks have no way of communicating with each other. Moreover, the users and applications of one network cannot communicate with the users and applications of another network. This communication can only be achieved using a special inter-shard communication mechanism, which adds an extra layer of complicated code for the developers to create.

In a shared blockchain, security is also a major concern. Since a huge blockchain is broken down into smaller subdomains, the hash power of the blocks also decreases. Therefore, it becomes easier for hackers to take over a single segment and manipulate it as per their desire. This is known as a single-shard takeover attack or a 1% attack, i.e., it takes only 1% of the network hash rate to dominate the entire shard in a 100-shard network. Once this attack occurs, the manipulation by the hackers can lead to the submission of invalid transactions to the main network or the loss of these transactions permanently.

Ethereum proposes a potential solution for this problem by using random sampling. In this method, a shard notary is randomly appointed to discrete segments to verify block authentication.

Alternatives to Sharding

Developers recommend two alternatives to improve performance and transaction speed. The first one is to increase the block size to fit more data into one block. This will ensure that less time is required to perform a greater number of transactions, i.e., higher TPS. Although it solves the performance and speed issues, a new one is brought forward. A device with more computation power will be required to verify a bigger block. Only such devices will qualify as nodes.

This will increase the cost of nodes and result in smaller node pools. These node pools will be more centralized, resulting in a 51% attack vulnerability. In this attack, the hacker can control most of the network hash rate to manipulate transaction history. Splitting the community will also be required to increase the block size. This will result in a hard fork, and if everyone doesn’t upgrade to the new blockchain, two such blockchains will exist using two different types of coins. These new and old versions will both contain a transaction history, much like parallel worlds but with no connection with each other.

Even though this alternative does solve the previous issues, it poses some new ones, so increasing the block size won’t qualify as a long-term solution. The second alternative suggested by the developers is an altcoin.

An alternative coin or altcoin is a substitute for bitcoin. It makes the execution of different applications on their chains and with their coins possible. The advantage is that only one blockchain won’t be overloaded, resulting in better performance. The hash power will be distributed among different blockchains, which will, however, result in increased network vulnerability to security threats. The hash power needed for a 51% attack will be much less; thus, the hacker can easily hack into the network. Although this solves the performance issue, the network's security is at risk. Hence, even this alternative isn’t a suitable option.

Sharding vs Sidechains

Sharding is implemented in the base-level protocol of a blockchain and is therefore called a layer-1 solution. It divides the main blockchain into smaller and individual blockchains called shards. On the other hand, a sidechain is a separate blockchain, but it is connected to the main blockchain using a two-way peg.

Sidechain has been around for a long time; it was introduced in 2014 in a research paper by Dr. Adam Back. According to his paper, sidechains can allow digital assets to be used from one blockchain in a different blockchain. Moreover, they can also be moved back into the original blockchain. Sidechains are also referred to as child chains by some blockchain projects like Ardor, MOAC, etc.

This sounds like a far better technology than sharding. Aelf is a blockchain project which is making use of sidechain technology. It has a multi-chain parallel computing blockchain. Some of its unique features are:

It solves scalability issues. Parallel processing reduces the latency rate.
Resource segregation will allow each smart contract to run on its blockchain.

Advantages of Sharding

The faster transaction rate leads to an all-over better experience for the users. Since there has been a rise in the number of applications that require constant communication with the database, it has become essential to maintain security and speed. Sharding is a great technology for keeping up with both of these aspects.

Scalability has also become the need of the hour as the amount of data is increasing every second. Some apps might be doing fine right now, but they will have a lot of data to handle over time. Thus, sharding is a viable solution as it creates small data segments that can be easily accessed.

It also eliminates the need for machines with high computation power because the aggregated data is not stored only in one place. Since there are shards of data, a simple device can also work.

It is also easier to maintain than a huge database or blockchain in smaller shards.

Who uses sharding?

Some blockchains have started using sharding, but some are still developing it. There is more than one approach to introducing a sharding mechanism into a blockchain. Many factors determine which one will work well with a certain blockchain. The major reason for using sharding is scaling. Scaling is essential to keep up with the increasing number of nodes and make room for them. Sharding makes massive blockchains sustainable by segmenting them into shards.

Zilliqa, a secure and scalable public blockchain platform, was the first to incorporate sharding. It was launched in 2017 and achieved a throughput of 2,828 TPS in its test net and can easily achieve 2,500 transactions at any point in time. Its striking feature is that high scalability is possible on it as a new shard is created every 600 nodes.

NEAR is another platform where sharding is used. It is a developer-friendly blockchain to test and deploy decentralized applications easily. It states that its sharding technology allows the nodes to stay small enough to run on simple cloud-hosted instances. These nodes can potentially be mobile devices in the future, making the whole process much easier.

Ethereum also plans to introduce sharding in its new update, Ethereum 2.0, which will be launched in January 2020. Some other blockchain projects like Cardano, QuarkChain, and PChain are also moving towards sharding to solve the scalability issue.

The Future of Sharding

Distributed ledger technologies (DLTs) have taken up almost every industry. This also means that a lot of data is being stored and kept adding. Some of these DLTs are doing fine now, but it might not be the case after a couple of years or even months. The information overload will be responsible for poor performance shortly. If scaling isn’t done, these technologies will become so slow that it will take days or weeks to load a blockchain.

Entering sharding is, as of now, the only viable option for all these problems. It balances a network’s load across all nodes and solves the blockchain trilemma.

The blockchain trilemma is that a blockchain can only possess two of the following three properties at a given time:

Decentralization
Security
Scalability

Sharding gained popularity recently after Facebook released more information on its Libra coin. It’s for Facebook’s financial services, which are expected to launch in 2020. Facebook plans on creating a more stable cryptocurrency so that consumers are encouraged to use them for ordinary online transactions.

However, despite the benefits of sharding, its implementation in blockchains has been somewhat limited.