Troubleshooting

Stuck? Get unblocked with RL Swarm on Windows (WSL 2), Linux, or macOS, or reach out to support for more help.

Overview

This troubleshooting guide provides a complete reference for diagnosing and resolving known issues when installing, running, or maintaining an RL Swarm node across Windows (WSL 2), Linux, and macOS.

If you need additional support, you can open a ticket or visit our Discord.

Installation and Dependency Issues

This section covers:

"Command not found" errors for Python, Docker, and Git
Build failures and/or missing libraries
Permission issues (denials) when running Docker/Python scripts

Update your Package Manager

Run the following command to update your package manager:

sudo apt update && sudo apt upgrade -y

Install Missing Packages

You may be missing dependencies. To double-check and install any missing packages, run this command:

python3 python3-venv python3-pip curl wget git docker.io build-essential

Verify your Python Version

RL Swarm requires a specific version of Python to be installed in order to run.

Check your Python version by running python3 --version which must return 3.10 or higher. If you have an older version, upgrade via package manager or pyenv.

Configuring Docker

Many Docker-related issues arise from memory allocation constraints or ports which are already in use.

Start the Docker Daemon

Run the following command to start up the Docker Daemon.

sudo systemctl enable docker && sudo systemctl start docker

You may need to enter your password if using sudo privileges.

Test Docker

The command docker run hello-world should print “Hello from Docker!”

If it doesn't, reinstall (Windows [WSL 2] and Linux) or restart Docker Desktop (macOS).

Memory Allocation

Increase container memory by navigating to Docker Desktop > Settings > Resources > Advanced > Memory then set it to the maximum value (at least 16gb recommended).

Docker and Virtualization issues

Sometimes builds will hang, crash, or be unreachable.

This section deals with the inability to connect to the Docker daemon and docker-compose syntax issues:

Try the alternate syntax for modern Docker: docker compose (no hyphen). If that fails, fall back to docker-compose.
Ensure virtualization is enabled in BIOS / Firmware.
WSL 2 users: enable WSL integration inside Docker Desktop Settings > Resources > WSL Integration, then select your distribution.
Linux GPU users: verify that your NVIDIA drivers are up-to-date and the CUDA toolkit is installed. The nvidia-smi must show a running driver.
macOS users: RL Swarm can only run CPU-only. GPU mode is not supported.

For “Out of memory” (OOM) build errors, close other applications or increase Docker memory limit (above).

If you're experiencing issues logging in, this section provides quick fixes for login modal issues, peer identity issues, and more.

Issue

Fix

Browser window never opens for login

Manually open the login URL by typing http://localhost:3000 into your browser.
If you're using a VM/VPS, use the flag -L 3000:localhost:3000 port forwarding flag when connecting.

Login modal fails to load, or OTP not sent to email

Upgrade viem to version 2.25.0 inside modal-login/package.json.
Run cd modal-login && yarn upgrade && yarn add next@latest viem@latest.

Login works, but training fails after re-login

Delete the old peer identity and restart using sudo rm swarm.pem. Then re-run RL Swarm and log in again with the same email.

Lost swarm.pem identity

You must generate a new one using the same email to retain your on-chain account.

Running multiple nodes

Use the same email login for each node. Each node has its own peer ID,but shares the same EOA.

VPS login fallbacks

If port:3000 is blocked, you can use temporary tunnels such as Cloudflare or nGrok if comfortable with networking tools.

Training and Performance Issues

Some commonly experienced training issues are 'false flags' whereas others require some manual input.

Symptom's we've seen:

Training appears stuck, or isn't progressing: Consumer-grade CPUs, especially MacBooks, can take more than ~20 minutes per training cycle. Please be patient!

If training freezes for longer than a previous iteration, use ctrl/cmd+c to stop, then return the container & script.

"Skipped round" messages: This is normal. It means your machine was slower than the swarm round pace.
OOM (Out of Memory) errors: Try closing other applications and increasing the Docker memory allocation as mentioned above.
High CPU usage and/or thermal throttling: This is normal if you're training in CPU-only mode. If your device allows for it, try switching to GPU-only mode.
"GPU not detected" warnings: Confirm that your drivers are correctly installed and recognized, and that the container is launched using swarm-gpu.

To force CPU-only mode explicity, use the swarm-cpu command.

Network and Connectivity Issues

Docker may need to be configured in your Firewall settings to allow outbound traffic, or you may be in a region where RL Swarm is currently unavailable.

Nodes from China, Russia, Ukraine, and sometimes Japan are blocked. Use a different region or VPS outside those areas.

Common connection issues:

Node doesn't appear on the dashboard: Check your internet connection and make sure the firewall allows outbound traffic from Docker. Also, visit the Gensyn Dashboard and confirm that your node is visible under RL Swarm.
Predication Market bets are not visible: Make sure you answered 'Y' when asked to join the Prediction Market. Rerun the script if necessary.
VPS connection drops: If the SSH tunnel breaks and you see “broken pipe” errors, press ctrl/cmd+c to kill the script, then restart RL Swarm, and it should cleanly re-initalize.

Logs and Diagnostics

Browse the table below to find the most useful log types and locations inside the rl-swarm repository.

Location

Type

/logs/yarn.log

Modal login server activity.

/logs/swarm.log

The main application log.

/logs/wandb/

Training logs and debug.log for Weights & Biases (if this is enabled).

/logs/prg_record.txt and swarm_launcher.log

Prediction Market details.

How to Interpret Logs

Many warnings (e.g., Protobuf "yanked version") are benign and can be safely ignored.

When looking at your logs for errors, look for lines containing ERROR, RuntimeError or Traceback to locate actual failure points.

When posting to Discord or Github, please attach the relevant section of swarm.log as well as your system info.

Advanced and Recovery Scenarios

Below are some specific scenarios you may run into when running multiple nodes, or nodes on different machines.

Moving to a new machine: Make sure to back up your swarm.pen from the repo's root, then copy it into the same directory on the new machine before launching RL Swarm.
Running multiple GPUs or peers: Install RL Swarm separately for each GPU, and exposre each peer under a different port.
Clean rebuilds: Stop all containers and processes by using ctrl/cmd+c then run docker system prune -a to remove old containers. Delete .venv and re-clone the repository if necessary.
Using tuneling tools in cases where the login port is blocked: This is only for advanced users who are comfortable with network tools. Use the simplest tool that works, since the local login method is recommended for security and reliablity. Tools include Cloudflare, nGrok, or localtunnel.

When to Esclate

If you're experiencing an issue that none of the above steps are able to resolve, we're here to help.

Check the GitHub Issues page to see if your issue has already been reported.
If you open a new issue or ask for help on Discord, please include the operating system and version, CPU and GPU model, amount of RAM, and as much context on the error(s) as possible.

PreviousNode Management NextBlockAssist

Last updated 4 months ago

hashtagOverview

hashtagInstallation and Dependency Issues

hashtagUpdate your Package Manager

hashtagInstall Missing Packages

hashtagVerify your Python Version

hashtagConfiguring Docker

hashtagStart the Docker Daemon

hashtagTest Docker

hashtagMemory Allocation

hashtagDocker and Virtualization issues

hashtagLogin and Identity Issues

hashtagTraining and Performance Issues

hashtagNetwork and Connectivity Issues

hashtagLogs and Diagnostics

hashtagHow to Interpret Logs

hashtagAdvanced and Recovery Scenarios

hashtagWhen to Esclate

Overview

Installation and Dependency Issues

Update your Package Manager

Install Missing Packages

Verify your Python Version

Configuring Docker

Start the Docker Daemon

Test Docker

Memory Allocation

Docker and Virtualization issues

Login and Identity Issues

Training and Performance Issues

Network and Connectivity Issues

Logs and Diagnostics

How to Interpret Logs

Advanced and Recovery Scenarios

When to Esclate