Reproducible Execution Environment (REE)
Run AI model inference in a machine-agnostic environment where the same model and inputs produce the same outputs across supported hardware.
Overview
REE (Reproducible Execution Environment) is Gensyn's toolchain for executing AI model inference in a machine-agnostic, bitwise-reproducible fashion.
It packages everything needed to run a model: [1] export, [2] compilation, [3] inference, and [4] output decoding, into a containerized pipeline that produces bitwise-identical results regardless of which hardware it runs on.
REE is comprised of three main components:
Gensyn SDK: The engine that orchestrates the end-to-end pipeline: export, compilation, inference, and output decoding.
Gensyn Compiler: An MLIR-based, multi-stage compiler that converts ONNX models into PyTorch modules, optionally replacing standard kernels with reproducible ones.
RepOp Kernels: Purpose-built CPU kernels and GPU operators that guarantee bitwise-identical outputs across different hardware, parallelism configurations, and run orders.
You interact with all of these through the REE TUI, a terminal interface that lets you configure and run generations without touching the underlying CLI directly, unless you're interested in advanced usage.
While the scripts in this repository are open-source, REE as a whole is not. REE includes proprietary components that are downloaded from Gensyn servers, and these components are subject to Gensyn's licensing terms.
By using this software, you agree to comply with those terms. For official terms and conditions, please see the EULA licensing agreement.
Why Reproducibility?
Standard GPU execution is inherently non-deterministic, meaning the same model with the same inputs can produce different outputs each time you run it.
This happens because of how GPUs handle mathematical operations: they split work across many parallel processors to run faster, but this parallelization can happen in slightly different orders between runs. Even tiny differences in the order of operations can accumulate through the many layers of a neural network, eventually leading to noticeably different results.
Existing solutions like PyTorch's deterministic mode only solve part of the problem. They can make your results consistent on the same GPU across multiple runs, but they break down when you switch to different hardware. For example, an A100 and an H100 will still produce different outputs. These tools also have limited coverage and can't account for the fact that different GPU architectures implement mathematical functions differently at the hardware level.
REE solves this because reproducibility is essential for verifiable AI inference.
When third parties need to independently verify that a computation was performed correctly, such as in decentralized compute networks or prediction markets, they must be able to run the same model on their own hardware and get exactly the same result.
REE achieves this through RepOps, custom operators that use careful mathematical techniques (fixed reduction ordering, correctly rounded functions, and extended precision) to guarantee identical outputs across any hardware, without sacrificing too much performance.
Operation Modes
REE supports three operation modes, which you can set via the Extra Args field in the TUI:
default
Uses standard PyTorch kernels. No determinism guarantees.
❌
❌
deterministic
Uses PyTorch deterministic algorithms. Reproducible on the same hardware across runs.
✅
❌
reproducible
Uses Gensyn RepOp kernels. Bitwise-identical results across any supported hardware.
✅
✅
There are different use cases for the three operation modes:
Use
reproduciblewhen results must be independently verifiable by a third party on different hardware.Use
deterministicwhen you need repeatable results on your own machine.Use
defaultfor development and testing where speed matters more than reproducibility.
Last updated
