词元之母TOK.MOM - 平台充值汇率 1:1 即 1 人民币充值到账 1 美元,支持一个 Key 调用近 600+ 海内外模型,限时特价模型低至 1 折,欢迎上岸!
| 来源 | 可选 — 通过 hermes skills install official/mlops/trl-fine-tuning 安装 |
| 路径 | optional-skills/mlops/training/trl-fine-tuning |
| 版本 | 1.0.0 |
| 作者 | Orchestra Research |
| 许可证 | MIT |
| 依赖项 | trl, transformers, datasets, peft, accelerate, torch |
| 平台 | linux, macos, windows |
| 标签 | Post-Training, TRL, Reinforcement Learning, Fine-Tuning, SFT, DPO, PPO, GRPO, RLHF, Preference Alignment, HuggingFace |
RLHF Training:
- [ ] Step 1: Supervised fine-tuning (SFT)
- [ ] Step 2: Train reward model
- [ ] Step 3: PPO reinforcement learning
- [ ] Step 4: Evaluate aligned modelDPO Training:
- [ ] Step 1: Prepare preference dataset
- [ ] Step 2: Configure DPO
- [ ] Step 3: Train with DPOTrainer
- [ ] Step 4: Evaluate alignment{
"prompt": "What is the capital of France?",
"chosen": "The capital of France is Paris.",
"rejected": "I don't know."
}GRPO Training:
- [ ] Step 1: Define reward function
- [ ] Step 2: Configure GRPO
- [ ] Step 3: Train with GRPOTraineraccelerate 支持