Troubleshooting
Stuck? Get unblocked with RL Swarm on Windows (WSL 2), Linux, or macOS, or reach out to support for more help.
Overview
This troubleshooting guide provides a complete reference for diagnosing and resolving known issues when installing, running, or maintaining an RL Swarm node across Windows (WSL 2), Linux, and macOS.
If you need additional support, you can open a ticket or visit our Discord.
Installation and Dependency Issues
This section covers:
"Command not found" errors for Python, Docker, and Git
Build failures and/or missing libraries
Permission issues (denials) when running Docker/Python scripts
Update your Package Manager
Run the following command to update your package manager:
sudo apt update && sudo apt upgrade -yRun the following command to update your package manager:
brew update && brew upgradeInstall Missing Packages
You may be missing dependencies. To double-check and install any missing packages, run this command:
python3 python3-venv python3-pip curl wget git docker.io build-essentialYou may be missing dependencies. To double-check and install any missing packages, run this command:
brew install python git dockerVerify your Python Version
RL Swarm requires a specific version of Python to be installed in order to run.
Check your Python version by running python3 --version which must return 3.10 or higher. If you have an older version, upgrade via package manager or pyenv.
Configuring Docker
Many Docker-related issues arise from memory allocation constraints or ports which are already in use.
Start the Docker Daemon
Run the following command to start up the Docker Daemon.
sudo systemctl enable docker && sudo systemctl start dockerRun the following command to start up the Docker Daemon.
sudo service docker startOpen Docker Desktop and confirm that it's running.
Test Docker
The command docker run hello-world should print “Hello from Docker!”
If it doesn't, reinstall (Windows [WSL 2] and Linux) or restart Docker Desktop (macOS).
Memory Allocation
Increase container memory by navigating to Docker Desktop > Settings > Resources > Advanced > Memory then set it to the maximum value (at least 16gb recommended).
Docker and Virtualization issues
Sometimes builds will hang, crash, or be unreachable.
This section deals with the inability to connect to the Docker daemon and docker-compose syntax issues:
Try the alternate syntax for modern Docker:
docker compose(no hyphen). If that fails, fall back todocker-compose.Ensure virtualization is enabled in BIOS / Firmware.
WSL 2 users: enable WSL integration inside Docker Desktop Settings > Resources > WSL Integration, then select your distribution.
Linux GPU users: verify that your NVIDIA drivers are up-to-date and the CUDA toolkit is installed. The
nvidia-smimust show a running driver.macOS users: RL Swarm can only run CPU-only. GPU mode is not supported.
For “Out of memory” (OOM) build errors, close other applications or increase Docker memory limit (above).
Login and Identity Issues
If you're experiencing issues logging in, this section provides quick fixes for login modal issues, peer identity issues, and more.
Browser window never opens for login
Manually open the login URL by typing
http://localhost:3000into your browser.If you're using a VM/VPS, use the
flag -L 3000:localhost:3000port forwarding flag when connecting.
Login modal fails to load, or OTP not sent to email
Upgrade
viemto version 2.25.0 insidemodal-login/package.json.Run
cd modal-login && yarn upgrade && yarn add next@latest viem@latest.
Login works, but training fails after re-login
Delete the old peer identity and restart using sudo rm swarm.pem. Then re-run RL Swarm and log in again with the same email.
Lost swarm.pem identity
You must generate a new one using the same email to retain your on-chain account.
Running multiple nodes
Use the same email login for each node. Each node has its own peer ID,but shares the same EOA.
VPS login fallbacks
If port:3000 is blocked, you can use temporary tunnels such as Cloudflare or nGrok if comfortable with networking tools.
Training and Performance Issues
Some commonly experienced training issues are 'false flags' whereas others require some manual input.
Symptom's we've seen:
Training appears stuck, or isn't progressing: Consumer-grade CPUs, especially MacBooks, can take more than ~20 minutes per training cycle. Please be patient!
If training freezes for longer than a previous iteration, use ctrl/cmd+c to stop, then return the container & script.
"Skipped round" messages: This is normal. It means your machine was slower than the swarm round pace.
OOM (Out of Memory) errors: Try closing other applications and increasing the Docker memory allocation as mentioned above.
High CPU usage and/or thermal throttling: This is normal if you're training in CPU-only mode. If your device allows for it, try switching to GPU-only mode.
"GPU not detected" warnings: Confirm that your drivers are correctly installed and recognized, and that the container is launched using
swarm-gpu.
Network and Connectivity Issues
Docker may need to be configured in your Firewall settings to allow outbound traffic, or you may be in a region where RL Swarm is currently unavailable.
Nodes from China, Russia, Ukraine, and sometimes Japan are blocked. Use a different region or VPS outside those areas.
Common connection issues:
Node doesn't appear on the dashboard: Check your internet connection and make sure the firewall allows outbound traffic from Docker. Also, visit the Gensyn Dashboard and confirm that your node is visible under RL Swarm.
Predication Market bets are not visible: Make sure you answered 'Y' when asked to join the Prediction Market. Rerun the script if necessary.
VPS connection drops: If the SSH tunnel breaks and you see “broken pipe” errors, press
ctrl/cmd+cto kill the script, then restart RL Swarm, and it should cleanly re-initalize.
Logs and Diagnostics
Browse the table below to find the most useful log types and locations inside the rl-swarm repository.
/logs/yarn.log
Modal login server activity.
/logs/swarm.log
The main application log.
/logs/wandb/
Training logs and debug.log for Weights & Biases (if this is enabled).
/logs/prg_record.txt and swarm_launcher.log
Prediction Market details.
How to Interpret Logs
Many warnings (e.g., Protobuf "yanked version") are benign and can be safely ignored.
When looking at your logs for errors, look for lines containing ERROR, RuntimeError or Traceback to locate actual failure points.
Advanced and Recovery Scenarios
Below are some specific scenarios you may run into when running multiple nodes, or nodes on different machines.
Moving to a new machine: Make sure to back up your
swarm.penfrom the repo's root, then copy it into the same directory on the new machine before launching RL Swarm.Running multiple GPUs or peers: Install RL Swarm separately for each GPU, and exposre each peer under a different port.
Clean rebuilds: Stop all containers and processes by using
ctrl/cmd+cthen rundocker system prune -ato remove old containers. Delete.venvand re-clone the repository if necessary.Using tuneling tools in cases where the login port is blocked: This is only for advanced users who are comfortable with network tools. Use the simplest tool that works, since the local login method is recommended for security and reliablity. Tools include Cloudflare, nGrok, or localtunnel.
When to Esclate
If you're experiencing an issue that none of the above steps are able to resolve, we're here to help.
Check the GitHub Issues page to see if your issue has already been reported.
If you open a new issue or ask for help on Discord, please include the operating system and version, CPU and GPU model, amount of RAM, and as much context on the error(s) as possible.
Last updated