Page cover

RL Swarm

RL Swarm lets anyone, anywhere, join and participate in a distributed reinforcement learning system that learns faster together than alone.

What is RL Swarm?

RL Swarm is a decentralized training environment where reinforcement learning (RL) agents cooperate over the internet instead of inside a single datacenter.

Each node runs a local language model that participates in multi-stage RL reasoning games, which involves answering, critiquing, and revising solutions alongside peers.

By connecting an RL Swarm node to an on-chain identity on the Gensyn Testnet, every participant’s contributions are logged and verifiable. This enables a persistent view of collective training performance across the network.

Why It Exists

Traditional RL research happens inside isolated labs using centralized GPU clusters. These environments are expensive, inaccessible, and closed by design.

RL Swarm was built to show that reinforcement learning can happen collaboratively and trustlessly across independent machines, powered by Gensyn’s decentralized execution and verification layers.

By turning multi-agent RL into a networked experiment, RL Swarm demonstrates:

  • How peer-to-peer learning can outperform solo training.

  • How collective reasoning can improve model quality and efficiency.

  • How the Gensyn Protocol’s primitives, [1] execution, [2] verification, [3] communication, and [4] coordination, work together in a live environment.

RL Swarm forms the foundation of Phase 0 of the Gensyn Testnet, providing the first public demonstration of decentralized AI collaboration in action.

What You Can Do With It

Anyone can clone the RL Swarm repository, run a node locally, and connect to the live swarm.

In the swarm, each node participates in four stages of RL:

1

Initialize a Local Model

Load a small open-source model (for example, Qwen 2.5 1.5B) to act as your local learning agent.

2

Join a Shared Reasoning Task

Connect to the active swarm and take part in multi-stage reasoning challenges, like solving math, logic, or coding problems collaboratively with other nodes.

3

Communicate & Critique

Exchange answers, feedback, and critiques with peers using a decentralized gossip protocol that enables cross-node communication.

4

Learn & Update Collectively

Incorporate reinforcement signals from the swarm’s collective feedback to refine your model and improve global performance over time.

When a session (“episode”) ends, the node’s updated weights can be uploaded to a model hub like Hugging Face or logged directly to the Gensyn Testnet, which creates and contributes to a transparent record of the decentralized training progress.

Ready?

Head over to Getting Started section and select your platform for OS-specific set-up guides, or browse our Troubleshooting documentation.

Last updated