AI/ML
RLHF (Reinforcement Learning from Human Feedback)
Também chamado de:RLHF
📖O que é
A training technique that aligns LLM outputs with human preferences. Process: (1) train a reward model from human comparisons of outputs, (2) use reinforcement learning (PPO) to optimize the LLM against the reward model. RLHF makes models more helpful, harmless, and honest. Used by Claude, ChatGPT, and other assistants. Alternatives include DPO (Direct Preference Optimization) and Constitutional AI.
Sua exploração
0 termos visitados no totalTermos relacionados explorados0/2