Skip to main content

Reproducible builds

Intermediate
Tutorial

Overview

If you are using a canister that you did not develop yourself, you may want to verify that the canister is running the code that you expect it to be before giving it control to make important decisions for you, such as accepting ICP for a payment. Verifying a canister's code requires confirming that the Wasm module is the correct result of compiling the canister source code and that the canister is in fact running that Wasm module and not another Wasm module.

It is therefore recommended that canister authors ensure trusted and reproducible builds of the Wasm module are possible from their source code. Such builds allow anyone to follow the same steps as the canister authors and yield the exact same Wasm module, which can then be compared to the Wasm module executing on ICP.

Considerations for developers

When developing code meant to be reproducible, it is recommended to use Docker or a similar technology to conveniently set up the operating system and the build tools and fix their versions for the user. If the build tools you are using don’t guarantee fully reproducible builds, Docker can also help by minimizing the differences in paths, environment variables, etc. The build tools and the base Docker image should be sourced from somewhere that the user can trust. Ideally, you want to minimize the number of dependencies, as the user may have to rebuild all of your dependencies to properly reproduce your build.

However, build tools aren’t perfect, and may fail to ensure reproducible builds. If reproducibility is critical for your canister (e.g., it holds other users' funds), test it.

Obtaining the Wasm module hash

ICP does not allow you to access the Wasm module of an arbitrary canister. This is a design decision, as developers might want to keep some code private. However, ICP does allow anyone to access the SHA-256 of the Wasm module.

To obtain this hash, you must first note the principal of the canister whose code you want to check. For example, to check the code of the Internet Identity canister, the principal is rdmx6-jaaaa-aaaaa-aaadq-cai. Then, the easiest way to access this data is using dfx from a command line using the following command:

dfx canister --network ic info rdmx6-jaaaa-aaaaa-aaadq-cai

This will return the controller(s) of the canister and the Wasm module hash:

Controllers: r7inp-6aaaa-aaaaa-aaabq-cai
Module hash: 0x86ab08ea53e4da5bba4f27baa931e2edc5ab2a1f228c204a5340992c16389f66

You will need to run this command from a directory that contains a valid dfx.json file. If you don’t have such a directory, you can create one using dfx new.

Here, ICP tells us that the hash of the Wasm module of the rdmx6-jaaaa-aaaaa-aaadq-cai canister is 0x86ab08ea53e4da5bba4f27baa931e2edc5ab2a1f228c204a5340992c16389f66.

The check above provides you the current hash of the canister’s Wasm module, but the controllers of a canister may change the code at any time, such as to upgrade the canister. However, if the list of controllers is empty or the only controller is a blackhole canister, you know that the canister is immutable since no one has the power to change the code.

Verifying the build is reproducible

Next, check whether this hash corresponds to some given source code. This only works if the build process for the code is reproducible.

As a canister author, there are a few things you have to provide to your users to allow them to reproduce your build:

  •   The same source code that you used to create the Wasm module for the canister.

  •   Instructions on how to recreate your build environment.

  •   Instructions on how to repeat the process of building the Wasm from the source code. The process must be deterministic to ensure that it results in the exact same Wasm. It also has to be trusted, such that the user can be convinced that the Wasm is a faithful translation of the source code and not an artifact of a malicious build tool. In particular, look for .dfx, node_modules, and target directories that could contain prebuilt files.

Providing the source code

Typically, it is recommended that developers version your code in git or some other version control system, and versioned code may be available in a public repository such as GitHub. In this case, as the developer, you should note the particular commit that was used when producing the code that has been deployed to ICP and communicate it to the users who may want to verify your canister. Alternatively, you can also provide a package (i.e., a zip file or a tarball) containing the source code you used to build your canister.

Reproducing build environments

Before building your code, you should document the build environment you are using in detail. In particular, for the languages supported by the IC SDK (Motoko, Rust, TypeScript, Python):

  •   Note your local machine's operating system and its version.

  •   If you are using the IC SDK, note the version used. You can install arbitrary versions of the IC SDK using dfxvm.

  •   If you are building Motoko code in a way other than dfx build, note the version of moc you are using.

  •   If you are building Rust, note the version of cargo you are using.

  • If you are building TypeScript, note the version of Azle.

  • If you are building Python, note the version of Python and Kybra.

  •   If you are using a framework for frontend development, such as React, Vite, or webpack, note their versions.

  •   If your build process depends on any environment variables (such as the timezone or locale), note them.

You should communicate all of these to your user in the reproducible build instructions. Ideally, provide an executable recipe or script to recreate the build environment using tools such as Docker or Nix. It is recommended to use Docker, as it allows you to also pinpoint the operating system used for building the software.

Build environments using Docker

Docker containers are a popular solution for providing reproducible build environments.

For developers using macOS, it is recommended to install Docker using lima, as it proves more stable than Docker Desktop or Docker Machine. In particular, it avoids some QEMU bugs on Apple M1 machines.

After setting Docker up, you can use a Dockerfile to provide the user with a particular version of the operating system, as well as other necessary tooling, such as dfx, Node.js, and the Rust toolchain.

It is advised to stick with x86_64 for running the Docker container, as builds are generally not reproducible across architectures. See the docs on setting up cross-platform Docker containers in case your host environment is not x86_64.

Below is an example Dockerfile that creates a standardized Rust build environment.

Dockerfile
FROM ubuntu:22.04

ENV NVM_DIR=/root/.nvm
ENV NVM_VERSION=v0.39.1
ENV NODE_VERSION=18.1.0

ENV RUSTUP_HOME=/opt/rustup
ENV CARGO_HOME=/opt/cargo
ENV RUST_VERSION=1.62.0

ENV DFX_VERSION=0.23.0

# Install the basic environment needed for our build tools.
RUN apt -yq update && \
    apt -yqq install --no-install-recommends curl ca-certificates \
        build-essential pkg-config libssl-dev llvm-dev liblmdb-dev clang cmake rsync

# Install Node.js using nvm
ENV PATH="/root/.nvm/versions/node/v${NODE_VERSION}/bin:${PATH}"
RUN curl --fail -sSf https://raw.githubusercontent.com/creationix/nvm/${NVM_VERSION}/install.sh | bash
RUN . "${NVM_DIR}/nvm.sh" && nvm install ${NODE_VERSION}
RUN . "${NVM_DIR}/nvm.sh" && nvm use v${NODE_VERSION}
RUN . "${NVM_DIR}/nvm.sh" && nvm alias default v${NODE_VERSION}

# Install Rust and Cargo
ENV PATH=/opt/cargo/bin:${PATH}
RUN curl --fail https://sh.rustup.rs -sSf \
        | sh -s -- -y --default-toolchain ${RUST_VERSION}-x86_64-unknown-linux-gnu --no-modify-path && \
    rustup default ${RUST_VERSION}-x86_64-unknown-linux-gnu && \
    rustup target add wasm32-unknown-unknown &&\
    cargo install ic-wasm

# Install dfx
RUN sh -ci "$(curl -fsSL https://internetcomputer.org/install.sh)"

COPY . /canister
WORKDIR /canister

What this does

There are a couple of things worth noting about this Dockerfile:

  •   It starts from an official Docker image, such that all the installed tools are standard and come from standard sources. This provides the user with confidence that the build environment hasn’t been tampered with, and thus that the build process using Docker can be trusted.

  •   To ensure that specific versions of the build tools are installed, it installs them directly rather than through a package manager. Package managers usually don’t provide a way of pinning the build tools to specific versions.

To use this Dockerfile, get Docker up and running, place the Dockerfile in the project directory of your canister, and create the Docker container by running:

docker build -t mycanister .

This creates a Docker container image called mycanister, with Node.js, Rust, and dfx installed in it. Your canister source code will be copied to the directory /canister You should invoke docker build from the canister project directory. You can then enter an interactive shell inside of your Docker container by running:

docker run -it --rm mycanister

From here, you can experiment with the steps needed to build your canister. Once you are confident that the steps are deterministic, you can also put them in the Dockerfile. For example, to run a build script, add the line:

RUN ./build_script.sh

You can see an example in the Dockerfile of the Internet Identity canister.

Example build script

Below is an example script that can be used to programmatically build a Rust project:

This is an example build script for a Rust project and does not include a fully comprehensive example of what a build script may contain, nor does it include build instructions for other languages such as Motoko.

#!/bin/bash
#
# additional setup, e.g., build frontend assets:
# ...
# Rust build:
export RUSTFLAGS="--remap-path-prefix $(readlink -f $(dirname ${0}))=/build --remap-path-prefix ${CARGO_HOME}=/cargo"
cargo build --locked --target wasm32-unknown-unknown --release
ic-wasm target/wasm32-unknown-unknown/release/example_backend.wasm -o example_backend.wasm shrink

A build script such as the one above can be specified as a custom build script in dfx.json:

dfx.json
"canisters": {
  "example_backend": {
    "candid": "src/example_backend/example_backend.did",
    "package": "example_backend",
    "type": "custom",
    "wasm": "./example_backend.wasm",
    "build": "./build_script.sh"
  }
}

Ensuring the determinism of the build process

Next, consider if it is necessary to make the build deterministic. For the build process to be deterministic:

  • Step 1: You will need to ensure that any dependencies of your canister are always resolved in the same way. Most build tools support a way of pinning dependencies to a particular version.

    -   For npm, running npm install will create a package-lock.json file with some fixed versions of all (transitive) dependencies of your project that satisfy the requirements specified in your package.json. However, npm install will overwrite the package-lock.json file every time it is invoked. Thus, once you are ready to create the final version of your canister, run npm install only once. After that, commit package-lock.json to your version control system. Finally, when checking the build for reproducibility, use npm ci instead of npm install.

    -   For Rust code, Cargo will automatically generate a Cargo.lock file with the fixed versions of your (transitive) dependencies. Like with package-lock.json, you should commit this file to your version control system once you are ready to produce the final version of your canister. Furthermore, Cargo by default ignores the locked versions of dependencies. Pass the --locked flag to the cargo command to ensure that the locked dependencies are used.

  • Step 2: Your own build scripts must not introduce non-determinism.

Sources of non-determinism include randomness, timestamps, concurrency, or code obfuscators. Less obvious sources include locales, absolute file paths, the order of files in a directory, and remote URLs whose content can change. Furthermore, relying on third-party build plug-ins exposes you to any non-determinism introduced by these.

  • Step 3. Given the same dependencies and deterministic build scripts, the build tools themselves (moc for Motoko, cargo for Rust, npx for TypeScript, pip for Python, or webpack for frontend development) must also be deterministic.

These tools aim to be deterministic. However, they are complicated pieces of software, and ensuring determinism is non-trivial. Thus, non-determinism bugs can and do occur.

Motoko deterministic considerations

The Motoko compiler aims to be deterministic and reproducible; if you find reproducibility issues, please submit a new issue, and the team will try to address them to the extent possible.

Rust deterministic considerations

For Rust, see the list of current potential non-determinism issues in Rust. If you have observed differences between Rust code compiled to Wasm under Linux and macOS, it is recommended to pin the build platform and its version.

Webpack deterministic considerations

For webpack, deterministic naming of module and chunk IDs that you should use have been introduced since version 5.

Testing reproducibility

If reproducibility is vital for your code, you should test the build to increase your confidence in its reproducibility. The tool reprotest can help you automate reproducibility tests, as it tests your build by running it in two different environments that differ in characteristics such as paths, time, file order, and others, and comparing the results. To check your build with reprotest, add the following line to your Dockerfile:

RUN apt -yqq install --no-install-recommends reprotest disorderfs faketime rsync sudo wabt

When using dfx build --network ic, you need to prebuild your frontend dependencies by running npm ci before dfx build --network ic or by setting the custom build type in dfx.json and running npm ci in your build script. Your project directory should contain a canister_ids.json file containing the IDs of your canisters on the mainnet. Below is an example canister_ids.json file:

canister_ids.json
{
  "example_backend": {
    "ic": "rrkah-fqaaa-aaaaa-aaaaq-cai"
  },
  "example_frontend": {
    "ic": "ryjl3-tyaaa-aaaaa-aaaba-cai"
  }
}

Now, from the root directory of your canister project, you can test the reproducibility of your build using the commands:

docker build -t mycanister .
docker run --rm --privileged -it mycanister
/canister# mkdir artifacts
/canister# reprotest -vv --store-dir=artifacts --variations '+all,-time' 'dfx build --network ic' '.dfx/ic/canisters/*/*.wasm'

What this does

  • The first command builds the Docker container using the Dockerfile provided earlier.
  • The second command opens an interactive shell in the container, indicated by the flag -it. You can run this in privileged mode using the --privileged flag, as reprotest uses kernel modules for some build environment variations. You can also run it in non-privileged mode by excluding some of the variations; see the reprotest manual. The --rm flag will destroy the container after you close its shell.
  • Once inside the container, a directory is created for the build artifacts, and reprotest is launched in verbose mode using the -vv flags. You need to give it the build command you want to run as the first argument; in this example it is dfx build --network ic. It will then run the build in two different environments.
  • Finally, you need to tell reprotest which paths to compare at the end of the two builds. In this example, it compares the Wasm code for all canisters, which is found in the .dfx/ic directory. It is encouraged that you compare the artifacts produced by 'reprotest` while manually changing your system time.

This workflow omits the time variation because the Rust compiler uses jemalloc for dynamic memory allocation, and this library is not compatible with faketime used by reprotest to implement the time variation.

If the comparison doesn’t find any differences, you will see an output similar to this one:

=======================
Reproduction successful
=======================
No differences in ./.dfx/ic/canisters/*/*.wasm
6b2a15a918219138836e88e9c95f9c5d2d7b6d465df83ae05d6fd2b0f14f8a97  ./.dfx/ic/canisters/example_backend/example_backend.wasm
a047686c1d517e21d447bcd42c9394a12cdb240e06425b830c99d3a689b5ee20  ./.dfx/ic/canisters/example_frontend/assetstorage.wasm
a047686c1d517e21d447bcd42c9394a12cdb240e06425b830c99d3a689b5ee20  ./.dfx/ic/canisters/example_frontend/example_frontend.wasm

This is a good indicator that your build is not affected by your environment.

Note that reprotest can’t check that your dependencies are pinned properly. It is recommended that you run the container reprotest builds under several host operating systems and compare the results. If the comparison does find differences between the Wasm code produced in two builds, it will output a diff. You will then likely want to use the --store-dir flag of reprotest to store the outputs and the diff somewhere where you can analyze them. If you are struggling to achieve reproducibility, consider also using DetTrace, which is a container abstraction that tries to make arbitrary builds deterministic.

Finally, if your build is reproducible, you can compare the hash of the resulting Wasm code to the hash of the code that is running in a canister, which you retrieve as follows:

dfx canister --network ic info <canister-id>

Beware that this hash might change if the controllers upgrade the canister code.

Long-term considerations

Even after you achieve reproducibility for your builds, there are still other things to consider for the long term.

Reproducibility can be more demanding if you expect your canister code to stay around for years and remain reproducible. The biggest challenges are to ensure that your:

  •  Build toolchain is available in the future.

  •  Dependencies are available.

  •  Toolchain still runs and correctly builds your dependencies.

Distributions and package archives may drop old versions of packages, including both your toolchain and their dependencies. Web sites may go offline, and URLs might stop working. It’s prudent to back up all of your toolchain and dependencies. You should consider getting involved in projects such as Software Heritage, which do this on a large scale.

You might have to adjust your build process to ensure that your canister still builds. Even if the build changes, if it still yields the same result, your users can be confident that your canister is running the correct code. The trust argument is easier if your dependencies come from a trustworthy source, such as the Software Heritage project.