Reinforcement Learning
stable-baselines3
finance
stock-trading
deep-reinforcement-learning
dqn
ppo
a2c
Eval Results (legacy)
Instructions to use AdityaaXD/Multi-Agent_Reinforcement_Learning_Trading_System_Models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- stable-baselines3
How to use AdityaaXD/Multi-Agent_Reinforcement_Learning_Trading_System_Models with stable-baselines3:
from huggingface_sb3 import load_from_hub checkpoint = load_from_hub( repo_id="AdityaaXD/Multi-Agent_Reinforcement_Learning_Trading_System_Models", filename="{MODEL FILENAME}.zip", ) - Notebooks
- Google Colab
- Kaggle
| library_name: stable-baselines3 | |
| tags: | |
| - reinforcement-learning | |
| - finance | |
| - stock-trading | |
| - deep-reinforcement-learning | |
| - dqn | |
| - ppo | |
| - a2c | |
| model-index: | |
| - name: RL-Trading-Agents | |
| results: | |
| - task: | |
| type: reinforcement-learning | |
| name: Stock Trading | |
| metrics: | |
| - type: sharpe_ratio | |
| value: Variable | |
| - type: total_return | |
| value: Variable | |
| # ๐ค Multi-Agent Reinforcement Learning Trading System | |
| This repository contains trained Deep Reinforcement Learning agents for automated stock trading. The agents were trained using `stable-baselines3` on a custom OpenAI Gym environment simulating the US Stock Market (AAPL, MSFT, GOOGL). | |
| ## ๐ง Models | |
| The following algorithms were used: | |
| 1. **DQN (Deep Q-Network)**: Off-policy RL algorithm suitable for discrete action spaces. | |
| 2. **PPO (Proximal Policy Optimization)**: On-policy gradient method known for stability. | |
| 3. **A2C (Advantage Actor-Critic)**: Synchronous deterministic policy gradient method. | |
| 4. **Ensemble**: A meta-voter that takes the majority decision from the above three. | |
| ## ๐๏ธ Training Data | |
| The models were trained on technical indicators derived from historical daily price data (2018-2024): | |
| * **Returns**: Daily percentage change. | |
| * **RSI (14)**: Relative Strength Index. | |
| * **MACD**: Moving Average Convergence Divergence. | |
| * **Bollinger Bands**: Volatility measure. | |
| * **Volume Ratio**: Relative volume intensity. | |
| * **Market Regime**: Bull/Bear trend classification. | |
| ## ๐ Related Data | |
| * **Dataset Repository**: [AdityaaXD/Multi-Agent_Reinforcement_Learning_Trading_System_Data](https://huggingface.co/AdityaaXD/Multi-Agent_Reinforcement_Learning_Trading_System_Data) | |
| * **GitHub Repository**: [ADITYA-tp01/Multi-Agent-Reinforcement-Learning-Trading-System-Data](https://github.com/ADITYA-tp01/Multi-Agent-Reinforcement-Learning-Trading-System-Data) | |
| ## ๐ฎ Environment (`TradingEnv`) | |
| * **Action Space**: Discrete(3) - `0: HOLD`, `1: BUY`, `2: SELL`. | |
| * **Observation Space**: Box(10,) - Normalized technical features + portfolio state. | |
| * **Reward**: Profit & Loss (PnL) minus transaction costs and drawdown penalties. | |
| ## ๐ Usage | |
| ```python | |
| import gymnasium as gym | |
| from stable_baselines3 import PPO | |
| # Load the environment (custom wrapper required) | |
| # env = TradingEnv(df) | |
| # Load model | |
| model = PPO.load("ppo_AAPL.zip") | |
| # Predict | |
| action, _ = model.predict(obs, deterministic=True) | |
| ``` | |
| ## ๐ Performance | |
| Performance varies by ticker and market condition. See the generated `results/` CSVs for detailed Sharpe Ratios and Max Drawdown stats per agent. | |
| ## ๐ ๏ธ Credits | |
| Developed by **Adityaraj Suman** as part of the Multi-Agent RL Trading System project. | |