About 119,000 results
Open links in new tab
  1. LLM Reinforcement Learning | IBM

    Apr 23, 2026 · Reinforcement learning has become an essential tool for improving large language models after pretraining. Methods like RLHF, PPO and DPO help teams optimize outputs by using …

  2. Introduction to Reinforcement Learning and its Role in LLMs · …

    We’re on a journey to advance and democratize artificial intelligence through open source and open science.

  3. [2412.10400] Reinforcement Learning Enhanced LLMs: A Survey

    Dec 5, 2024 · In this work, we are going to make a systematic review of the most up-to-date state of knowledge on RL-enhanced LLMs, attempting to consolidate and analyze the rapidly growing …

  4. Reinforcement fine-tuning with LLM-as-a-judge

    Apr 30, 2026 · They’re built for each domain through verifiable reward functions that can score LLM generations through a piece of code (Reinforcement Learning with Verifiable Rewards or RLVR) or …

  5. Survey on Large Language Model-Enhanced Reinforcement Learning: …

    In this survey, we provide a comprehensive review of the existing literature in LLM-enhanced RL and summarize its characteristics compared with conventional RL methods, aiming to clarify the research …

  6. Mastering LLM Reinforcement Learning: A Comprehensive Tutorial …

    Dive into the world of LLM Reinforcement Learning! This tutorial explains RLHF, supervised fine-tuning, reward models, and PPO to align language models with human values. Perfect for AI innovators.

  7. Reinforcement Learning for LLMs: RLHF, DPO, and the Future of …

    Explore how reinforcement learning transforms LLMs post-training—from RLHF and DPO to cutting-edge RLVR pipelines. Learn how these techniques improve reasoning, alignment, controllability, and …

  8. Reinforcement Learning in Large Language Models (LLMs): The

    Sep 23, 2024 · We’ll examine innovative RL-based approaches including reinforcement learning from human feedback (RLHF) and reinforcement learning from AI feedback (RLAIF), which are making …

  9. Reinforcement Learning for LLM Post-Training: A Survey

    Jul 23, 2024 · Large language models (LLMs) trained via pretraining and supervised fine-tuning (SFT) can still produce harmful and misaligned outputs, or struggle in domains like math and coding. …

  10. The Role of Reinforcement Learning in Enhancing LLM Performance

    Jan 8, 2025 · That’s where reinforcement learning (RL) steps in, adding layers of learning and adaptability that take LLMs to the extra mile. This blog post explores how reinforcement learning …