This post is part three in a miniseries about optimizing a cryptographic protocol for decentralized data storage. It is not necessary to read the first two parts about the SiaMux or Ephemeral Accounts, however they do provide useful context for some of the techniques discussed in this post. We will be discussing the design and implementation of the Merklized Data Machine (MDM), which enables us to perform verifiable updates to data that is being stored on an untrusted remote host.
We are going to start by overviewing how data is stored on hosts and how Sia previously communicated with hosts to upload more data, modify it, or download it in the past.
File contracts are what make Sia a reliable store of data. A file contract is an agreement between a renter and host for a set duration of time. After that period, the host is required to prove that it held onto any data uploaded to it for the duration of the file contract.
The file contracts serve two main purposes. The first one is acting as payment channels which can be used for high-frequency micro transactions to hosts. We estimate that at the time of writing around 10 million of such transactions happen on the network per day.
Secondly, they allow us to apply a financial penalty to hosts which lose stored data. To be able to penalize a host, we need to be able to prove to the network that a host isn’t in possession of some data that it’s supposed to have. To do so, the file contract contains a commitment to the stored data in form of a Merkle Root which refers to the root element of a Merkle Tree.
A Merkle Tree is a form of cryptographic accumulator which allows a host to commit to some data. The commitment itself is the Merkle Root which is a constant size hash. In Sia’s case, a Blake2B hash. A Merkle Tree is created by hashing individual blocks of data into the leaves of the tree. These leaves are then concatenated and hashed pair-wise, resulting in the next layer. This process is repeated until only the top/root hash a.k.a. Merkle Root remains.
Now that we know how Merkle Trees are built, let’s get to the interesting things they let us do. Using so-called Merkle Proofs, given the Merkle Root in the file contract, a host can prove that a sector is part of that file contract and what position it can be found at without revealing any of the other sectors. For example if a host wants to prove that L1 (from the figure above) is part of the data covered by the root hash, it can reveal L1, Hash 0–1 and Hash 1.
The verifier then does the following:
- Hash L1 to get Hash 0–0
- Hash 0–0 and Hash 0–1 to get Hash 0
- Hash Hash 0 and compare the result to Top Hash
If the last step succeeds, the proof is valid. This means L1 is part of the Merkle Tree and that it’s at the first position.
Merkle Trees in Sia
In Sia’s case, the individual leaves of the Merkle Tree which are hashed up to the Merkle Root stored within the file contract are called segments. Each segment has a size of 64 bytes. The size of the segments determines the granularity with which we can request operations on data while still being able to provide a Merkle Proof that the operation was performed faithfully.
To address data on the host, we have the host keep an in-memory mapping from a hash of data, to where the actual data is stored on disk. Since a Blake2B hash has a size of 32 bytes, keeping all segment hashes in memory would involve a 50% overhead. In other words, uploading 1TB of data to a host would result in 500GB of hashes. Since that’s not very economical, hosts don’t keep the segment hashes but the hashes of so-called sectors instead. A sector is a group of 2¹⁶ neighbored segments and therefore consists of 4 MiB of data. A sector’s Merkle Root is the root of the Merkle Sub-Tree that covers that data. This means that using a Merkle Root, a renter can address any 4 MiB of data on the host which is also the reason why 4 MiB is the minimum amount of data that can be uploaded to a host. We do have plans to reduce that number in the future for situations where a smaller sector size is more desirable.
The MDM is a virtual machine that resides on the host. It can be used to execute arbitrary programs which are built from a set of available instructions. Each instruction includes a cryptographic proof that the instruction was executed faithfully according to the data committed in the corresponding file contract.
At the time of writing there are eight instructions available.
- Read Offset: Reads data from an offset within the contract
- Read Registry: Reads an entry from the Skynet registry
- Read Sector: Reads data from a sector specified by Merkle Root
- Append: Appends a sector to the contract
- Revision: Returns the latest revision of a contract
- Has Sector: Returns whether a host knows a sector without downloading it
- Swap Sectors: Swaps the specified sectors within the contract
- Drop Sectors: Drops sectors at the end of the contract
Each instruction fulfills a simple task and multiple instructions can be used to craft complex programs. The big difference to earlier versions of our renter-host protocol is, that executing a program is a single RPC whereas before some of these instructions or combinations of them would be standalone RPCs themselves. Programs offer a lot more flexibility than RPCs since they are not set in stone. Instead a renter can execute any program as long as they are paying for it.
An MDM program can be crafted using any of the aforementioned instructions and they are always executed atomically on the host. So any changes made to a contract by a program will only be committed if it is executed without errors. Instructions are executed individually and their intermediate results are returned to the renter. As soon as an instruction fails, execution of the program will stop and the renter will know not to expect any more results from the MDM.
A full program consists of instructions and a stream of input data called the program data. When transmitting a program to the host for execution, the instructions are sent first, followed by the program data. The instructions themselves only reference their inputs within the program data. As a result, instructions can not only reuse the same inputs by referencing the same part of the program data, but they can also modify the program data at a certain offset to dynamically change the input for a following instruction. While this is not actively used yet, it opens the doors for a plethora of extensions such as conditional instructions which change the flow of a program.
The MDM can start executing a program after receiving the instructions but before receiving the whole program data. This is a latency optimization which allows large programs to respond with individual instructions’ outputs before all of the input is even read. It also allows renters to wait for earlier instruction results before sending all of the program data, essentially piping the output of instructions into later instructions as inputs.
In addition to the intermediary instruction outputs, the renter also receives a Merkle Proof for each instruction. These proofs are possible since every instruction is either related to a contract or uses a Merkle Root as input. These proofs are verified as they arrive and cryptographically prove every state transaction between instructions. After the execution of the last instruction and verifying its proof, renter and host both agree on what the new Merkle Root of the contract should be. Therefore the renter knows that the host manipulated the data honestly without having the data itself.
Another key feature of the MDM is the way the execution cost is computed. Since MDMs can execute arbitrary programs, the renter now specifies an allowance for executing a program. After successful execution, this allowance is usually drained. If a program is only excuted partially, or the renter overestimated the cost, the MDM refunds the renter by depositing the money into the renter’s Ephemeral Account.
The execution cost itself can be split into four parts.
- Instruction Cost
- Bandwidth Cost
- Memory Cost
- Storage Cost
The instruction cost varies between instructions. Usually instructions have a fixed base cost and a variable cost that is added on top of that depending on the inputs to the instruction. For example a Read Sector — instruction has a base cost to cover the disk access and a length cost related to how many bytes are actually read.
During the execution of the program, the host also tracks the incoming and outgoing bandwidth on the connection. The SiaMux adds a small overhead to the bandwidth consumption which is included in the tracking. After the execution of the program, the host charges the renter appropriately for the used bandwidth.
Since some instructions use a non-negligible amount of memory, the MDM also charges for memory over time. For example if the first instruction of the program requires the program to hold on to a 4 MiB sector until the end of the execution, the MDM will charge for these 4 MiB. To allow the renter to estimate this cost beforehand, each instruction has a time value associated with it. The cost per byte per time is then multiplied with that value for each instruction to get the total memory cost. The reason for this involved memory pricing scheme for is the prevention of DoS attacks. We don’t want anyone to craft long-running programs that consume lots of memory to prevent other renters from accessing a host. By putting a price tag on all resources, it becomes costly to perform any DoS attacks.
Last but not least, the MDM charges for storage. The cost of storage is refunded if the program execution fails. The reason for that is the fact that the programs are applied to the contract atomically. So if the program fails, the new storage isn’t added to the contract and therefore the host doesn’t need to store any additional data.
Parallel Program Execution
One of the biggest performance improvements over previous RPCs is the fact that the MDM is paid for using Ephemeral Accounts and does therefore not require a contract to be locked for all instructions. We categorize programs as either “read-only” or “read-write”. A read-only program only contains instructions that don’t update a contract’s underlying data while a read-write program does. A read-write program requires a contract to be locked during execution. This includes instructions such as Append or Swap Sectors and means that those can only be executed in series.
Read-only programs only require a snapshot of the contract’s sector roots which can be parallelized. Multiple read-only programs can also be executed in parallel with a single write program. Most write programs only append to a contract at the end and don’t interfere with read-only programs. Should a read-only program still try to read data which is modified by a read-write program at the same time, the read will fail if the modification changed the sector’s data and therefore its Merkle Root.
One special sub-category of read-only programs are programs consisting of instructions that don’t even require a snapshot of the contract. Instructions that access data by providing an offset within the contract, require the host to translate that offset to a sector’s Merkle Root which is then translated to the data. Therefore the host needs to be aware of the order of the roots within the contract when it starts executing the program.
Instructions such as Has Sector and Read Sector, address data by Merkle Root directly and as a result don’t require a snapshot. In fact, they can be used to download any data from a host regardless of what contract actually pays for it. These instructions make up the bulk of traffic from Skynet portals since skylinks often contain data that was uploaded by someone else to an unknown contract. This category of programs has an even lower latency for retrieving data since they don’t need to read a contract snapshot from disk.
The MDM is a new RPC which utilizes the power of the SiaMux and Ephemeral Accounts. It allows for executing arbitrary programs created from a set of instructions on a host to modify a file contract. It offers hosts more flexibility for pricing their resources and performs much faster than the old protocol due to the ability of running many read-only programs in parallel. Most importantly a renter can verify cryptographically that the program was executed faithfully by verifying Merkle Proofs which are created for every executed instruction.
This concludes our three part series about Skyrocket. We hope that you enjoyed these short dives into the core of Skynet/Sia. Stay tuned for more technical blog posts in the future and make sure to join us on Discord if you’d like to discuss this series, give some feedback, or if you’d just like to see a post explaining some other part of Sia or Skynet.