AI/ML
AI Alignment
Também chamado de:AI Safety
📖O que é
The practice of ensuring AI systems behave according to human intentions and values—being helpful, harmless, and honest. Alignment encompasses training-time techniques (RLHF, Constitutional AI, DPO), inference-time guardrails, and evaluation through red teaming. As models become more capable, alignment becomes critical to prevent harmful content generation or manipulation by bad actors.
Sua exploração
0 termos visitados no totalTermos relacionados explorados0/3
Termos Relacionados
RLHF (Reinforcement Learning from Human Feedback)AI/ML
A training technique that aligns LLM outputs with human preferences. Process: (1) train a …
Ver termo →Constitutional AIAI/ML
An alignment technique developed by Anthropic where an AI model is guided by a 'constituti…
Ver termo →DPO (Direct Preference Optimization)AI/ML
A simplified alternative to RLHF that aligns LLM outputs with human preferences without tr…
Ver termo →