Troubleshooting
Common errors, their causes, and how to fix them.
Common Errors
Quick fixes for Docker, CLI, and compilation issues you might hit while using REE.
PermissionError: [Errno 13] Permission denied: '/gensyn'
Container runs as non-root gensyn user; can't write to root-owned paths
Use /tmp/ paths for ephemeral runs, or mount a volume with -v
one of the arguments --tasks-root --task-dir is required
Missing required output directory argument
Add --task-dir /tmp/task or --tasks-root /tmp/tasks
one of the arguments --prompt-text --prompt-file is required
Missing prompt input
Add --prompt-text "your prompt" or --prompt-file path.jsonl
argument command: invalid choice: 'bash'
Trying to launch a shell but entrypoint is locked to gensyn-sdk
Use --entrypoint bash to override: docker run -it --entrypoint bash ree
Gibberish / nonsensical output
Using hf-internal-testing/tiny-random-LlamaForCausalLM which has random untrained weights
Expected behavior for test models; use a real model for meaningful output
Shell hangs after pasting command
Trailing \\ on the last line of a command
Remove the backslash from the final line
Compiler Trace Warnings
When running REE, you may see trace warnings and verbose compiler output in your terminal. These are expected and can be safely ignored. They originate from the ONNX export and MLIR compilation stages.
--tasks-root vs. --task-dir
--tasks-root vs. --task-dirIf you see errors about missing artifacts, make sure you're using consistent location flags. When using --tasks-root, the task directory is automatically derived from the model name. When using --task-dir, you must point to the same directory across all operations.
CUDA Not Available
If you're running on a machine with a GPU but REE doesn't detect it, ensure you're passing the --gpus all flag to Docker:
docker run --gpus all -v ~/.cache/gensyn:/gensyn ree run \\
--tasks-root /gensyn/tasks \\
--model-name <model> \\
--prompt-text "..." \\
--operation-set reproducibleUse --cpu-only to explicitly force CPU execution when GPU is not available or not desired.
Out-of-Memory (OOM) Issues
If you are using Docker Desktop, you may need to adjust the memory limit. Otherwise, you may attempt to run larger models (models with a higher parameter count) and encounter a failure during the model loading or checkpoint 'sharding' phase.
This typically shows up as a run:failed status with exit code 137, and the logs will show the process dying partway through "Loading checkpoint shards."
NaN Errors & Crashes with Certain Models
Some FP16 models, particularly certain Qwen 2.5 Instruct variants, may produce NaN (Not a Number) errors and crash when run in default or deterministic mode. This is a numerical stability issue: attention score calculations can overflow the FP16 value range during inference.
If you encounter this, try switching to reproducible mode (--operation-set reproducible), which handles these edge cases more gracefully. Note that even in reproducible mode, some affected models may still produce degraded output quality (repetitive text or unexpected tokens).
This is a known limitation related to the ONNX export pipeline's use of FP16 precision and is being actively addressed in future releases.