> For the complete documentation index, see [llms.txt](https://docs.gensyn.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.gensyn.ai/tech/ree.md).

# Reproducible Execution Environment (REE)

## Overview

REE (Reproducible Execution Environment) is Gensyn's toolchain for executing AI model inference in a machine-agnostic, bitwise-reproducible fashion.

It packages everything needed to run a model: **\[1]** export, **\[2]** compilation, **\[3]** inference, and **\[4]** output decoding, into a containerized pipeline that produces bitwise-identical results regardless of which hardware it runs on.

REE is comprised of three main components:

1. **Gensyn SDK:** The engine that orchestrates the end-to-end pipeline: export, compilation, inference, and output decoding.

{% hint style="info" %}
The Gensyn SDK also exposes higher-level primitives for reusable inference sessions and tool-augmented inference workflows.
{% endhint %}

2. **Gensyn Compiler:** An MLIR-based, multi-stage compiler that converts ONNX models into PyTorch modules, optionally replacing standard kernels with reproducible ones.
3. **RepOp Kernels:** Purpose-built CPU kernels and GPU operators that guarantee bitwise-identical outputs across different hardware, parallelism configurations, and run orders.

The Gensyn SDK also exposes higher-level primitives for reusable inference sessions, tool definitions in chat-template-based inference (v0.3.0), and thinking-mode control via chat templates (v0.4.0).

You interact with all of these through the [REE TUI](/tech/ree/using-the-tui.md)**,** a terminal interface that lets you configure and run generations *without* touching the underlying CLI directly, unless you're interested in [advanced usage.](/tech/ree/advanced-usage.md)

{% hint style="warning" %}
While the scripts in this repository are open-source, REE as a whole is not. REE includes proprietary components that are downloaded from Gensyn servers, and these components are subject to Gensyn's licensing terms.&#x20;

*By using this software, you agree to comply with those terms. For official terms and conditions, please see the* [*EULA licensing agreement.*](https://github.com/gensyn-ai/ree/blob/main/REE-Binary-License)
{% endhint %}

### Why Reproducibility?

Standard GPU execution is inherently non-deterministic, meaning the same model with the same inputs can produce different outputs each time you run it.&#x20;

This happens because of how GPUs handle mathematical operations: they split work across many parallel processors to run faster, but this parallelization can happen in slightly different orders between runs. Even tiny differences in the order of operations can accumulate through the many layers of a neural network, eventually leading to noticeably different results.

Existing solutions like PyTorch's deterministic mode only solve part of the problem. They can make your results consistent on the same GPU across multiple runs, but they break down when you switch to different hardware. For example, an A100 and an H100 will still produce different outputs. These tools also have limited coverage and can't account for the fact that different GPU architectures implement mathematical functions differently at the hardware level.

REE solves this because reproducibility is *essential* for verifiable AI inference.

When third parties need to independently verify that a computation was performed correctly, such as in decentralized compute networks or prediction markets, they must be able to run the same model on their own hardware and get exactly the same result.

REE achieves this through [RepOps](/tech/ree/advanced-usage/internals.md), custom operators that use careful mathematical techniques (fixed reduction ordering, correctly rounded functions, and extended precision) to guarantee identical outputs across any hardware, without sacrificing too much performance.

### Operation Modes

REE supports three operation modes, which you can set via the **Extra Args** field in the [TUI](/tech/ree/using-the-tui.md):

| Mode            | Behavior                                                                                  | Cross-run determinism | Cross-hardware determinism |
| --------------- | ----------------------------------------------------------------------------------------- | --------------------- | -------------------------- |
| `default`       | Uses standard PyTorch kernels. No determinism guarantees.                                 | ❌                     | ❌                          |
| `deterministic` | Uses PyTorch deterministic algorithms. Reproducible on **the same hardware** across runs. | ✅                     | ❌                          |
| `reproducible`  | Uses Gensyn RepOp kernels. Bitwise-identical results across **any supported hardware**.   | ✅                     | ✅                          |

There are different use cases for the three operation modes:

* Use `reproducible` when results must be independently verifiable by a third party on different hardware.&#x20;
* Use `deterministic` when you need repeatable results on your own machine.&#x20;
* Use `default` for development and testing where speed matters more than reproducibility.

### Tool-Augmented Inference

The Gensyn SDK supports tool-call workflows (first introduced in `v0.3.0`).

Tool definitions can be passed to `InferenceSession.complete()` when using chat-style `messages`. REE forwards them to the model tokenizer's chat template when supported.&#x20;

{% hint style="info" %}
REE does not ship built-in tools or execute tool calls. Instead, applications define tools, parse model output, and run the tool loop themselves.
{% endhint %}

External tool results are only reproducible if the application records and replays the exact outputs passed back to the model.

#### Thinking Mode

REE `v0.4.0` adds `enable_thinking` on `InferenceSession.complete()` for models whose chat templates support a thinking/reasoning toggle (for example, Qwen3).

* **SDK only:** This is not exposed on the `gensyn-sdk` CLI or TUI.
* **Requires `messages`:** use chat-style messages, not a plain `prompt` string.
* **Default:** `None`. When omitted, REE does not pass the `kwarg` and behavior matches prior releases.
* **Explicit values:** `enable_thinking=True` or `enable_thinking=False` are forwarded to `apply_chat_template` only when the tokenizer accepts the parameter.

{% hint style="info" %}
For CLI/TUI one-shot runs, use `--short-circuit-length` / `--short-circuit-token` to bound thinking tokens, or migrate to the SDK for direct on/off control.
{% endhint %}