# Troubleshooting

## Overview

This troubleshooting guide provides a complete reference for diagnosing and resolving *known issues* when installing, running, or maintaining an RL Swarm node across Windows (WSL 2), Linux, and macOS.

{% hint style="success" %}
If you need additional support, you can [open a ticket](https://github.com/gensyn-ai/rl-swarm/issues) or [visit our Discord.](https://discord.com/invite/gensyn)
{% endhint %}

### Installation and Dependency Issues

This section covers:

* "Command not found" errors for Python, Docker, and Git
* Build failures and/or missing libraries
* Permission issues (denials) when running Docker/Python scripts

#### Update your Package Manager

{% tabs %}
{% tab title="Linux & Windows (WSL 2)" %}
Run the following command to update your package manager:

```bash
sudo apt update && sudo apt upgrade -y
```

{% endtab %}

{% tab title="macOS" %}
Run the following command to update your package manager:

```bash
brew update && brew upgrade
```

{% endtab %}
{% endtabs %}

#### Install Missing Packages

{% tabs %}
{% tab title="Linux & Windows (WSL 2)" %}
You may be missing dependencies. To double-check and install any missing packages, run this command:

```bash
python3 python3-venv python3-pip curl wget git docker.io build-essential
```

{% endtab %}

{% tab title="macOS" %}
You may be missing dependencies. To double-check and install any missing packages, run this command:

```bash
brew install python git docker
```

{% endtab %}
{% endtabs %}

#### Verify your Python Version

RL Swarm requires a specific version of Python to be installed in order to run.&#x20;

Check your Python version by running `python3 --version` which *must* return **3.10** or higher. If you have an older version, upgrade via package manager or `pyenv`.

### Configuring Docker

Many Docker-related issues arise from memory allocation constraints or ports which are already in use.&#x20;

#### Start the Docker Daemon

{% tabs %}
{% tab title="Linux" %}
Run the following command to start up the Docker Daemon.&#x20;

```bash
sudo systemctl enable docker && sudo systemctl start docker
```

{% hint style="info" %}
You may need to enter your password if using `sudo` privileges.&#x20;
{% endhint %}
{% endtab %}

{% tab title="Windows (WSL 2)" %}
Run the following command to start up the Docker Daemon.&#x20;

```bash
sudo service docker start
```

{% hint style="info" %}
You may need to enter your password if using `sudo` privileges.&#x20;
{% endhint %}
{% endtab %}

{% tab title="macOS" %}
Open **Docker Desktop** and confirm that it's running.&#x20;
{% endtab %}
{% endtabs %}

#### Test Docker

The command `docker run hello-world should print` “Hello from Docker!”

If it doesn't, reinstall (Windows \[WSL 2] and Linux) or restart Docker Desktop (macOS).

#### Memory Allocation

Increase container memory by navigating to **Docker Desktop > Settings > Resources > Advanced > Memory** then set it to the maximum value (at *least* 16gb recommended).&#x20;

### Docker and Virtualization issues

Sometimes builds will hang, crash, or be unreachable.&#x20;

This section deals with the inability to connect to the Docker daemon and `docker-compose` syntax issues:

* Try the alternate syntax for modern Docker: `docker compose` (no hyphen). If that fails, fall back to `docker-compose`.
* Ensure virtualization is enabled in BIOS / Firmware.
* **WSL 2 users:** enable WSL integration inside **Docker Desktop Settings > Resources > WSL Integration**, then select your distribution.
* **Linux GPU users:** verify that your NVIDIA drivers are up-to-date and the CUDA toolkit is installed. The `nvidia-smi` must show a running driver.
* **macOS users:** RL Swarm can only run CPU-only. *GPU mode is not supported.*

{% hint style="warning" %}
For “Out of memory” (OOM) build errors, close other applications or increase Docker memory limit (above).
{% endhint %}

#### Login and Identity Issues

If you're experiencing issues logging in, this section provides quick fixes for login modal issues, *peer identity* issues, and more.

| Issue                                                 | Fix                                                                                                                                                                                                                                      |
| ----------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| *Browser window never opens for login*                | <ol><li>Manually open the login URL by typing <code><http://localhost:3000></code> into your browser.</li><li>If you're using a VM/VPS, use the <code>flag -L 3000:localhost:3000</code> port forwarding flag when connecting.</li></ol> |
| *Login modal fails to load, or OTP not sent to email* | <p></p><ol><li>Upgrade <code>viem</code> to version 2.25.0 inside <code>modal-login/package.json</code>.</li><li>Run <code>cd modal-login && yarn upgrade && yarn add next\@latest viem\@latest</code>.</li></ol>                        |
| *Login works, but training fails after re-login*      | Delete the old peer identity and restart using `sudo rm swarm.pem`. Then re-run RL Swarm and log in again with the same email.                                                                                                           |
| *Lost `swarm.pem` identity*                           | You must generate a new one using the same email to retain your on-chain account.                                                                                                                                                        |
| *Running multiple nodes*                              | Use the same email login for each node. Each node has its own peer ID,but shares the same EOA.                                                                                                                                           |
| *VPS login fallbacks*                                 | If `port:3000` is blocked, you can use temporary tunnels such as Cloudflare or nGrok if comfortable with networking tools.                                                                                                               |

### Training and Performance Issues

Some commonly experienced training issues are 'false flags' whereas others require some manual input.&#x20;

Symptom's we've seen:

* **Training appears stuck, or isn't progressing:** Consumer-grade CPUs, especially MacBooks, can take more than \~20 minutes per training cycle. Please be patient!&#x20;

{% hint style="warning" %}
If training freezes for longer than a previous iteration, use `ctrl/cmd+c` to stop, then return the container & script.&#x20;
{% endhint %}

* **"Skipped round" messages:** This is normal. It means your machine was slower than the swarm round pace.&#x20;
* **OOM (Out of Memory) errors:** Try closing other applications and increasing the Docker memory allocation as mentioned above.
* **High CPU usage and/or thermal throttling:** This is normal if you're training in CPU-only mode. If your device allows for it, try switching to GPU-only mode.
* **"GPU not detected" warnings:** Confirm that your drivers are correctly installed and recognized, and that the container is launched using `swarm-gpu`.&#x20;

{% hint style="info" %}
To force CPU-only mode explicity, use the `swarm-cpu` command.
{% endhint %}

### Network and Connectivity Issues

Docker may need to be configured in your Firewall settings to allow outbound traffic, or you may be in a region where RL Swarm is currently unavailable.&#x20;

{% hint style="danger" %}
Nodes from China, Russia, Ukraine, and sometimes Japan are blocked. Use a different region or VPS outside those areas.
{% endhint %}

Common connection issues:

* **Node doesn't appear on the dashboard:** Check your internet connection and make sure the firewall allows outbound traffic from Docker. Also, visit the [Gensyn Dashboard](https://dashboard.gensyn.ai/) and confirm that your node is visible under RL Swarm.
* **Predication Market bets are not visible:** Make sure you answered 'Y' when asked to join the Prediction Market. Rerun the script if necessary.&#x20;
* **VPS connection drops:** If the SSH tunnel breaks and you see “broken pipe” errors, press `ctrl/cmd+c` to kill the script, then restart RL Swarm, and it should cleanly re-initalize.&#x20;

### Logs and Diagnostics

Browse the table below to find the most useful log types and locations inside the `rl-swarm` repository.&#x20;

| Location                                        | Type                                                                     |
| ----------------------------------------------- | ------------------------------------------------------------------------ |
| `/logs/yarn.log`                                | Modal login server activity.                                             |
| `/logs/swarm.log`                               | The main application log.                                                |
| `/logs/wandb/`                                  | Training logs and `debug.log` for Weights & Biases (if this is enabled). |
| `/logs/prg_record.txt` and `swarm_launcher.log` | Prediction Market details.                                               |

#### How to Interpret Logs

Many warnings (e.g., Protobuf "yanked version") are **benign** and can be safely ignored.

When looking at your logs for errors, look for lines containing `ERROR`, `RuntimeError` or `Traceback` to locate actual failure points.&#x20;

{% hint style="info" %}
When posting to [Discord](https://discord.com/invite/gensyn) or [Github](https://github.com/gensyn-ai/rl-swarm/issues), please attach the relevant section of `swarm.log` as well as your system info.&#x20;
{% endhint %}

### Advanced and Recovery Scenarios

Below are some specific scenarios you may run into when running multiple nodes, or nodes on different machines.

* **Moving to a new machine:** Make sure to back up your `swarm.pen` from the repo's root, then copy it into the *same directory* on the new machine before launching RL Swarm.&#x20;
* **Running multiple GPUs or peers:** Install RL Swarm separately for each GPU, and exposre each peer under a different port.
* **Clean rebuilds:** Stop all containers and processes by using `ctrl/cmd+c` then run `docker system prune -a` to remove old containers. Delete `.venv` and re-clone the repository if necessary.&#x20;
* **Using tuneling tools in cases where the login port is blocked:** This is only for advanced users who are comfortable with network tools. Use the simplest tool that works, since the local login method is recommended for security and reliablity. Tools include **Cloudflare**, **nGrok**, or **localtunnel**.

### When to Esclate&#x20;

If you're experiencing an issue that none of the above steps are able to resolve, we're here to help.&#x20;

1. Check the [GitHub Issues](https://github.com/gensyn-ai/rl-swarm/issues) page to see if your issue has already been reported.
2. If you open a new issue or ask for help on [Discord](https://discord.com/invite/gensyn), please include the operating system and version, CPU and GPU model, amount of RAM, and as much context on the error(s) as possible.&#x20;
