Run any Skill in Manus with one click

cpp-reinforcement-learning

C++ Reinforcement Learning best practices using libtorch (PyTorch C++ frontend) and modern C++17/20. Use when: - Implementing RL algorithms in C++ for performance-critical applications - Building production RL systems with libtorch - Creating replay buffers and experience storage - Optimizing RL training with GPU acceleration - Deploying RL models with ONNX Runtime

Run Skill in Manus

Stars7

Forks0

UpdatedFebruary 13, 2026 at 17:12

Source

Aznatkoiny

Aznatkoiny/zAI-Skills

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

File Explorer

6 files

SKILL.md

readonly

name	cpp-reinforcement-learning
description	C++ Reinforcement Learning best practices using libtorch (PyTorch C++ frontend) and modern C++17/20. Use when: - Implementing RL algorithms in C++ for performance-critical applications - Building production RL systems with libtorch - Creating replay buffers and experience storage - Optimizing RL training with GPU acceleration - Deploying RL models with ONNX Runtime

C++ Reinforcement Learning

Overview

This skill covers implementing reinforcement learning algorithms in C++ using LibTorch (PyTorch C++ frontend) and modern C++17/20 features. It provides patterns for building high-performance RL systems suitable for production deployment, robotics, game AI, and real-time applications.

When to Use

Implementing DQN, PPO, SAC, or other RL algorithms in C++
Building performance-critical RL training pipelines
Creating efficient replay buffers with proper memory management
Deploying trained models with ONNX Runtime
Parallelizing environment rollouts across threads
Integrating RL with existing C++ codebases (games, robotics, simulations)

Core Libraries

Primary: LibTorch (PyTorch C++ Frontend)

LibTorch provides the same tensor operations and autograd capabilities as PyTorch in C++.

Installation: Download from https://pytorch.org/get-started/locally (select C++/LibTorch)

CMake Integration:

cmake_minimum_required(VERSION 3.18)
project(rl_project)

set(CMAKE_CXX_STANDARD 17)
find_package(Torch REQUIRED)

add_executable(train_agent src/main.cpp)
target_link_libraries(train_agent "${TORCH_LIBRARIES}")

Secondary Libraries

ONNX Runtime - Cross-platform inference deployment
cpprl (mhubii/cpprl) - Reference PPO implementation
Gymnasium C++ bindings - Environment interfaces

Quick Start: DQN Agent

#include <torch/torch.h>

struct DQNNet : torch::nn::Module {
    torch::nn::Linear fc1{nullptr}, fc2{nullptr}, fc3{nullptr};

    DQNNet(int64_t state_dim, int64_t action_dim) {
        fc1 = register_module("fc1", torch::nn::Linear(state_dim, 128));
        fc2 = register_module("fc2", torch::nn::Linear(128, 128));
        fc3 = register_module("fc3", torch::nn::Linear(128, action_dim));
    }

    torch::Tensor forward(torch::Tensor x) {
        x = torch::relu(fc1->forward(x));
        x = torch::relu(fc2->forward(x));
        return fc3->forward(x);
    }
};

// Training loop
auto policy_net = std::make_shared<DQNNet>(state_dim, action_dim);
auto target_net = std::make_shared<DQNNet>(state_dim, action_dim);
torch::optim::Adam optimizer(policy_net->parameters(), lr);

// Compute loss
auto q_values = policy_net->forward(states).gather(1, actions);
auto next_q = target_net->forward(next_states).max(1).values.detach();
auto target = rewards + gamma * next_q * (1 - dones);
auto loss = torch::mse_loss(q_values.squeeze(), target);

// Backward pass
optimizer.zero_grad();
loss.backward();
optimizer.step();

Essential Patterns

Replay Buffer (Ring Buffer)

class ReplayBuffer {
public:
    explicit ReplayBuffer(size_t capacity)
        : capacity_(capacity), position_(0), size_(0) {
        buffer_.reserve(capacity);
    }

    void push(Experience exp) {
        if (buffer_.size() < capacity_) {
            buffer_.push_back(std::move(exp));
        } else {
            buffer_[position_] = std::move(exp);
        }
        position_ = (position_ + 1) % capacity_;
        size_ = std::min(size_ + 1, capacity_);
    }

    std::vector<Experience> sample(size_t batch_size);

private:
    std::vector<Experience> buffer_;
    size_t capacity_, position_, size_;
    std::mt19937 rng_{std::random_device{}()};
};

GPU Device Management

torch::Device device = torch::cuda::is_available() ? torch::kCUDA : torch::kCPU;
model->to(device);

// Create tensors on device
auto tensor = torch::zeros({batch_size, state_dim},
    torch::TensorOptions().device(device).dtype(torch::kFloat32));

Inference Mode

{
    torch::NoGradGuard no_grad;
    auto action_values = model->forward(state);
    auto action = action_values.argmax(1);
}

Common Pitfalls

Forgetting train/eval mode - Call model->train() or model->eval()
Missing NoGradGuard - Use for inference to save memory
Tensor accumulation - Use .detach() for stored tensors
Thread safety - Clone models for parallel threads
Device mismatch - Verify all tensors on same device

Reference Files

references/libtorch.md - LibTorch setup and API guide
references/algorithms.md - DQN, PPO, SAC implementations
references/memory-management.md - Replay buffers, smart pointers, RAII
references/performance.md - Optimization, parallelization, GPU
references/testing.md - Testing and debugging strategies

More from this repository

same repository

resume-updater

Aznatkoiny/zAI-Skills

Conversational skill that interviews the user to capture professional experience and build/update their career-profile.json. Use when the user says "update my resume", "add my recent experience", "refresh my profile", "capture my work history", "build my resume", "add a new job to my resume", or wants to create or modify their professional profile data. When to Use: - User wants to add new experience, skills, projects, or achievements - User wants to create their career profile from scratch - User wants to update existing profile entries - User mentions they changed jobs, got promoted, or completed a project When NOT to Use: - User wants to generate/format a resume (use /resume-generator command) - User wants career advice (use career-director agent) - User wants to search for jobs (use job-search agent)

2026-02-207

real-estate-investment

Aznatkoiny/zAI-Skills

End-to-end real estate investment analysis skill. Use when users ask to analyze a property deal, run the numbers on a rental, evaluate real estate investments, build a pro forma, compare markets, calculate cap rate, cash-on-cash return, IRR, NOI, DSCR, equity multiple, or GRM. Also triggers on: BRRRR analysis, house hack evaluation, short-term rental (STR/Airbnb) analysis, commercial underwriting, multifamily deal analysis, syndication waterfall modeling, Monte Carlo simulation for real estate, sensitivity analysis on a deal, 1031 exchange planning, cost segregation analysis, depreciation calculations, real estate tax strategy, market comparison and scoring, rental property screening, deal screening, investor report generation, real estate financial modeling, property type comparison, rent-to-price analysis, development feasibility, land analysis, value-add underwriting, API integration for real estate data (Zillow, Redfin, AirDNA, Mashvisor, ATTOM, Rentcast, Census), or any real estate investment financial a

2026-02-177

deep-learning

Aznatkoiny/zAI-Skills

Comprehensive guide for Deep Learning with Keras 3 (Multi-Backend: JAX, TensorFlow, PyTorch). Use when building neural networks, CNNs for computer vision, RNNs/Transformers for NLP, time series forecasting, or generative models (VAEs, GANs). Covers model building (Sequential/Functional/Subclassing APIs), custom training loops, data augmentation, transfer learning, and production best practices.

2026-02-137

openclaw-setup

Aznatkoiny/zAI-Skills

Set up, install, configure, and deploy OpenClaw (formerly ClawdBot/MoltBot) — a personal AI assistant that runs on your own devices and connects to messaging channels. Use when users ask to "set up OpenClaw," "install ClawdBot," "install MoltBot," "deploy a personal AI assistant," "configure OpenClaw on Mac," "deploy OpenClaw to VPS," "set up OpenClaw on Hostinger," "connect OpenClaw to Telegram," "configure iMessage with OpenClaw," or any variation involving OpenClaw installation, gateway configuration, channel setup, Anthropic auth, or security hardening. Also triggers on "openclaw onboard," "openclaw doctor," "openclaw security audit," troubleshooting OpenClaw deployments, OpenClaw security, OpenClaw cost control, or ClawHub skills safety.

2026-02-137

prompt-optimizer

Aznatkoiny/zAI-Skills

Optimize prompts for Claude 4.x models using Anthropic's official best practices. Use when users want to improve, refine, or create effective prompts for Claude. Triggers include requests to optimize prompts, make prompts more effective, fix underperforming prompts, create system prompts, improve instruction following, reduce verbosity, control output formatting, or enhance agentic/tool-use behaviors. Also use when users report Claude is being too verbose, not following instructions, not using tools properly, or producing generic outputs.

2026-02-137

reinforcement-learning

Aznatkoiny/zAI-Skills

Reinforcement Learning best practices for Python using modern libraries (Stable-Baselines3, RLlib, Gymnasium). Use when: - Implementing RL algorithms (PPO, SAC, DQN, TD3, A2C) - Creating custom Gymnasium environments - Training, debugging, or evaluating RL agents - Setting up hyperparameter tuning for RL - Deploying RL models to production

2026-02-137

name	cpp-reinforcement-learning
description	C++ Reinforcement Learning best practices using libtorch (PyTorch C++ frontend) and modern C++17/20. Use when: - Implementing RL algorithms in C++ for performance-critical applications - Building production RL systems with libtorch - Creating replay buffers and experience storage - Optimizing RL training with GPU acceleration - Deploying RL models with ONNX Runtime

C++ Reinforcement Learning

Overview

When to Use

Implementing DQN, PPO, SAC, or other RL algorithms in C++
Building performance-critical RL training pipelines
Creating efficient replay buffers with proper memory management
Deploying trained models with ONNX Runtime
Parallelizing environment rollouts across threads
Integrating RL with existing C++ codebases (games, robotics, simulations)

Core Libraries

Primary: LibTorch (PyTorch C++ Frontend)

LibTorch provides the same tensor operations and autograd capabilities as PyTorch in C++.

Installation: Download from https://pytorch.org/get-started/locally (select C++/LibTorch)

CMake Integration:

cmake_minimum_required(VERSION 3.18)
project(rl_project)

set(CMAKE_CXX_STANDARD 17)
find_package(Torch REQUIRED)

add_executable(train_agent src/main.cpp)
target_link_libraries(train_agent "${TORCH_LIBRARIES}")

Secondary Libraries

ONNX Runtime - Cross-platform inference deployment
cpprl (mhubii/cpprl) - Reference PPO implementation
Gymnasium C++ bindings - Environment interfaces

Quick Start: DQN Agent

#include <torch/torch.h>

struct DQNNet : torch::nn::Module {
    torch::nn::Linear fc1{nullptr}, fc2{nullptr}, fc3{nullptr};

    DQNNet(int64_t state_dim, int64_t action_dim) {
        fc1 = register_module("fc1", torch::nn::Linear(state_dim, 128));
        fc2 = register_module("fc2", torch::nn::Linear(128, 128));
        fc3 = register_module("fc3", torch::nn::Linear(128, action_dim));
    }

    torch::Tensor forward(torch::Tensor x) {
        x = torch::relu(fc1->forward(x));
        x = torch::relu(fc2->forward(x));
        return fc3->forward(x);
    }
};

// Training loop
auto policy_net = std::make_shared<DQNNet>(state_dim, action_dim);
auto target_net = std::make_shared<DQNNet>(state_dim, action_dim);
torch::optim::Adam optimizer(policy_net->parameters(), lr);

// Compute loss
auto q_values = policy_net->forward(states).gather(1, actions);
auto next_q = target_net->forward(next_states).max(1).values.detach();
auto target = rewards + gamma * next_q * (1 - dones);
auto loss = torch::mse_loss(q_values.squeeze(), target);

// Backward pass
optimizer.zero_grad();
loss.backward();
optimizer.step();

Essential Patterns

Replay Buffer (Ring Buffer)

class ReplayBuffer {
public:
    explicit ReplayBuffer(size_t capacity)
        : capacity_(capacity), position_(0), size_(0) {
        buffer_.reserve(capacity);
    }

    void push(Experience exp) {
        if (buffer_.size() < capacity_) {
            buffer_.push_back(std::move(exp));
        } else {
            buffer_[position_] = std::move(exp);
        }
        position_ = (position_ + 1) % capacity_;
        size_ = std::min(size_ + 1, capacity_);
    }

    std::vector<Experience> sample(size_t batch_size);

private:
    std::vector<Experience> buffer_;
    size_t capacity_, position_, size_;
    std::mt19937 rng_{std::random_device{}()};
};

GPU Device Management

torch::Device device = torch::cuda::is_available() ? torch::kCUDA : torch::kCPU;
model->to(device);

// Create tensors on device
auto tensor = torch::zeros({batch_size, state_dim},
    torch::TensorOptions().device(device).dtype(torch::kFloat32));

Inference Mode

{
    torch::NoGradGuard no_grad;
    auto action_values = model->forward(state);
    auto action = action_values.argmax(1);
}

Common Pitfalls

Forgetting train/eval mode - Call model->train() or model->eval()
Missing NoGradGuard - Use for inference to save memory
Tensor accumulation - Use .detach() for stored tensors
Thread safety - Clone models for parallel threads
Device mismatch - Verify all tensors on same device

Reference Files

references/libtorch.md - LibTorch setup and API guide
references/algorithms.md - DQN, PPO, SAC implementations
references/memory-management.md - Replay buffers, smart pointers, RAII
references/performance.md - Optimization, parallelization, GPU
references/testing.md - Testing and debugging strategies