Monitoring your training stats

You can now upload your RL Swarm logs to Weights & Biases and easily monitor your system stats (such as GPU Utilization, GPU Temperature), and training stats (such as loss, learning_rate, and rewards).

First, make sure you're running the latest version of rl-swarm.

Once you stop the rl_swarm.sh process in your console (e.g., by pressing Ctrl+C), you will see a message similar to this:

wandb: You can sync this run to the cloud by running:
wandb: wandb sync logs/wandb/offline-run-xxxxxxxx_xxxxxx-xxxxxxxxxx

To upload your training statistics:

  1. Make sure you have created an account on wandb.ai.

  2. Copy the wandb sync command provided in your terminal (the part that looks like wandb sync logs/wandb/offline-run-xxxxxxxx_xxxxxx-xxxxxxxxxx).

  3. Run that command in your terminal.

  4. When prompted, enter your API key that can be found in https://wandb.ai/authorize.

This will upload your local training run data to the Weights & Biases cloud, allowing you to visualize and track your experiments. For more details on this command, you can refer to the official documentation.

Last updated