Monitoring your training stats

You can now upload your RL Swarm logs to Weights & Biases and easily monitor your system stats (such as GPU Utilization, GPU Temperature), and training stats (such as loss, learning_rate, and rewards).

First, make sure you're running the latest version of rl-swarm.

Once you stop the rl_swarm.sh process in your console (e.g., by pressing Ctrl+C), you will see a message similar to this:

wandb: You can sync this run to the cloud by running:
wandb: wandb sync logs/wandb/offline-run-xxxxxxxx_xxxxxx-xxxxxxxxxx

To upload your training statistics:

Make sure you have created an account on wandb.ai.
Copy the wandb sync command provided in your terminal (the part that looks like wandb sync logs/wandb/offline-run-xxxxxxxx_xxxxxx-xxxxxxxxxx).
Run that command in your terminal.
When prompted, enter your API key that can be found in https://wandb.ai/authorize.

This will upload your local training run data to the Weights & Biases cloud, allowing you to visualize and track your experiments. For more details on this command, you can refer to the official documentation.

PreviousConnecting your Node NextTroubleshooting

Last updated 4 months ago