Internals
How the Gensyn Compiler and RepOp kernels achieve bitwise reproducibility under the hood.
Gensyn Compiler
The Gensyn Compiler converts ONNX-serialized ML models into PyTorch modules, optionally with reproducible RepOp kernels replacing standard operations.
It is an MLIR-based, multi-stage compiler with a Python execution layer.
How It Works
The compiler uses MLIR dialects to reason about the incoming model:
A dialect that determines which operations need to be lowered to RepOp kernels (rather than standard PyTorch kernels).
A dialect that generates the final PyTorch module from a given set of operations.
Python API
from gensyn_mjolnir import convert, CompileOptions
# Convert an ONNX model to a PyTorch module with reproducible kernels
module = convert(
"model/model.onnx",
options=CompileOptions(requires_reproducibility=True)
)Convert()
convert() is the core function of the Gensyn Compiler. It takes an ONNX model and converts it into a PyTorch module that can be used for inference.
When requires_reproducibility is enabled (which it is by default), the compiler replaces standard PyTorch operations with RepOp kernels that guarantee bitwise-identical results across hardware.
It has two parameters:
onnx_model_or_path: Astr,PathorModelPrototype. This is either a file path to an ONNX model on a disk or an in-memoryModelProtoobject.options(CompileOptions) is the configuration for the compilation process, which you can read more about below. It defaults to reproducible mode.
CompileOptions
CompileOptions controls how the compiler processes the model. In most cases the defaults are what you want, which corresponds to reproducible mode with symlinked tensors and a temporary artifacts directory.
The fields are:
artifacts_dir: alwaysstrorNone. This is the directory where the compiler writes intermediate artifacts. If not set, a temporary directory is used (preserved when theMJOLNIR_DEBUGenvironment variable is set, which is useful for inspecting compiler output during debugging).colocate_tensors: Aboolthat isFalseby default. When set toTrueit copies external tensor files into the artifacts directory. WhenFalse, it creates symbolic links instead.
Symlinking is faster and saves disk space, but copying may be needed if you plan to move the artifacts directory to another location.
requires_reproducibility: Also aboolbut set toTrueby default. WhenTrue, the compiler replaces standard PyTorch operations with RepOp kernels for cross-hardware reproducibility. WhenFalse, it uses standard PyTorch kernels which are faster but not reproducible across different hardware.
RepOps
RepOps (Reproducible Operators) are purpose-built GPU kernels that guarantee bitwise-identical outputs regardless of hardware architecture. They cover the full set of operators needed for neural network inference and training.
For a standalone demo of RepOp kernels, see the RepOps Demo repository.
How RepOps Achieve Cross-Hardware Reproducibility
Fixed reduction ordering: Every kernel accumulates values in a single canonical order. The reduction tile size is fixed across all GPU architectures. All accumulation is in FP32 using fused multiply-add instructions.
Correctly rounded transcendentals: Custom implementations of
exp,sin,tanh, etc. that produce identical results on every CUDA-capable GPU.Extended-precision arithmetic: Operations like the error function (used in GELU) use extended-precision fixed-point arithmetic for cross-hardware consistency.
Architecture-adaptive output tiling: Kernels adapt output tile dimensions to different GPU architectures (using available shared memory), but never change the reduction dimension, so reproducibility is preserved.
Container Details
Base OS
Ubuntu 24.04.1 LTS
Python
3.11.14
PyTorch
2.9.1
Transformers
4.51.0
ONNX
1.16.1
SDK Version
gensyn-sdk 0.1.0
Entrypoint
/runtime/bin/gensyn-sdk
User
gensyn (non-root)
Working Dir
/home/gensyn
Interactive Mode
To explore REE's components directly (SDK, Compiler), start the container in interactive mode:
From inside the container, you can run gensyn-sdk commands directly and inspect intermediate artifacts.