The Zilliqa network experienced an unplanned outage due to a specific function call (Scilla built in function) that impacted the Mainnet operation on the 29th July at 22:59:14 UTC
Early investigation and immediate actions taken
We identified (by analyzing the node logs) that the root cause was the built-in Scilla function-call ecdsa_recover_pk with an “out of range” parameter. The out of range parameter (rec_id) was not handled correctly by the integrated OCaml secp256k1 library, and that caused a segmentation fault. The segmentation fault caused the Scilla server process to terminate on the mining nodes. The Zilliqa process depends on the Scilla server process – so the Zilliqa binary stopped processing blocks and the main net stopped progressing.
Investigating the Root Cause
The root cause: The built in Scilla function ecdsa_recover_pk caused a crash.
The Scilla function depends on an external library secp256k1 (cryptographic library). The deployed version of secp256k1 resulted in a segmentation fault when one of the parameters (rec_id) was out of range (valid range is 0,1,2,3). In this case, the caller used 28 as the rec_id parameter, which led to the outage.
The secp256k1 OCaml wrapper libraries did not check (until version 0.4.4) the recid argument is within the bounds calling the unsafe code which segmentation faults if given incorrect inputs.
Proposed Fix and Next Steps
The proposed fix is to rebuild the Scilla instance again to ensure that it would pick up the latest version (0.4.4) of secp256k1 OCaml wrapper library, which was reprogrammed to ensure that it handled the out of range parameter correctly and mitigate this issue reoccurring.
Status of Fix and next Steps
We rebuilt the Scilla binaries running on the Zilliqa miners with the latest version (0.4.4) of the OCaml wrapper library, and the dependent Zilliqa binaries to ensure that it would pick up the new Scilla version (v0.11.1). This has been tested in our private dev testnet and we could reproduce the issue before the fix. We shall proceed to deploy the binaries on the main net and look to converge the network as quickly as possible.
Thanks for your patience and we sincerely apologise for the inconvenience and thanks for the continued support.
We shall keep you updated.
The Zilliqa Team