# Reproducible Execution Environment (REE)

## Overview

REE (Reproducible Execution Environment) is Gensyn's toolchain for executing AI model inference in a machine-agnostic, bitwise-reproducible fashion.

It packages everything needed to run a model: **\[1]** export, **\[2]** compilation, **\[3]** inference, and **\[4]** output decoding, into a containerized pipeline that produces bitwise-identical results regardless of which hardware it runs on.

REE is comprised of three main components:

* **Gensyn SDK:** The engine that orchestrates the end-to-end pipeline: export, compilation, inference, and output decoding.
* **Gensyn Compiler:** An MLIR-based, multi-stage compiler that converts ONNX models into PyTorch modules, optionally replacing standard kernels with reproducible ones.
* **RepOp Kernels:** Purpose-built CPU kernels and GPU operators that guarantee bitwise-identical outputs across different hardware, parallelism configurations, and run orders.

You interact with all of these through the [REE TUI](/tech/ree/using-the-tui.md)**,** a terminal interface that lets you configure and run generations *without* touching the underlying CLI directly, unless you're interested in [advanced usage.](/tech/ree/advanced-usage.md)

{% hint style="info" %}
While the scripts in this repository are open-source, REE as a whole is not. REE includes proprietary components that are downloaded from Gensyn servers, and these components are subject to Gensyn's licensing terms.&#x20;

*By using this software, you agree to comply with those terms. For official terms and conditions, please see the* [*EULA licensing agreement.*](https://github.com/gensyn-ai/ree/blob/main/REE-Binary-License)
{% endhint %}

### Why Reproducibility?

Standard GPU execution is inherently non-deterministic, meaning the same model with the same inputs can produce different outputs each time you run it.&#x20;

This happens because of how GPUs handle mathematical operations: they split work across many parallel processors to run faster, but this parallelization can happen in slightly different orders between runs. Even tiny differences in the order of operations can accumulate through the many layers of a neural network, eventually leading to noticeably different results.

Existing solutions like PyTorch's deterministic mode only solve part of the problem. They can make your results consistent on the same GPU across multiple runs, but they break down when you switch to different hardware. For example, an A100 and an H100 will still produce different outputs. These tools also have limited coverage and can't account for the fact that different GPU architectures implement mathematical functions differently at the hardware level.

REE solves this because reproducibility is *essential* for verifiable AI inference.

When third parties need to independently verify that a computation was performed correctly, such as in decentralized compute networks or prediction markets, they must be able to run the same model on their own hardware and get exactly the same result.

REE achieves this through [RepOps](/tech/ree/advanced-usage/internals.md), custom operators that use careful mathematical techniques (fixed reduction ordering, correctly rounded functions, and extended precision) to guarantee identical outputs across any hardware, without sacrificing too much performance.

### Operation Modes

REE supports three operation modes, which you can set via the **Extra Args** field in the [TUI](/tech/ree/using-the-tui.md):

| Mode            | Behavior                                                                                  | Cross-run determinism | Cross-hardware determinism |
| --------------- | ----------------------------------------------------------------------------------------- | --------------------- | -------------------------- |
| `default`       | Uses standard PyTorch kernels. No determinism guarantees.                                 | ❌                     | ❌                          |
| `deterministic` | Uses PyTorch deterministic algorithms. Reproducible on **the same hardware** across runs. | ✅                     | ❌                          |
| `reproducible`  | Uses Gensyn RepOp kernels. Bitwise-identical results across **any supported hardware**.   | ✅                     | ✅                          |

There are different use cases for the three operation modes:

* Use `reproducible` when results must be independently verifiable by a third party on different hardware.&#x20;
* Use `deterministic` when you need repeatable results on your own machine.&#x20;
* Use `default` for development and testing where speed matters more than reproducibility.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.gensyn.ai/tech/ree.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
